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Abstract 

A  numerical  method  based  on  high  order  WENO  discretization  of  the 
compressible  Nervier-Stokes  equations  and  a  sub-grid  scale  model  for  LES 
was  tested  and  validated  for  the  simulation  of  transition  and  turbulence  of 
normal  and  oblique  shock  boundary  layer  interaction  over  a  flat  plate  for  a 
wide  range  of  Mach  numbers.  An  eddy  viscosity  sub-grid  scale  model  is  used 
for  large  eddy  simulations  (LES)  for  both  the  turbulent  and  transitional  flow 
regime.  The  numerical  simulations  show  that  good  agreement  with  available 
experimental  data  is  obtained  for  both  the  predicted  pressure  distribution 
and  velocity  profiles. 

1  Introduction 

The  discretization  of  the  inviscid  fluxes  is  based  on  standard  finite  difference 
WENO  schemes.  The  numerical  code  includes  options  for  5th,  7th  and  9th 
order  accurate  discretizations  of  the  inviscid  fluxes.  All  computations  shown 
here  were  performed  with  the  9th  order  accurate  WENO  scheme.  The  viscous 
fluxes  were  evaluated  with  a  4th  order  accurate  explicit,  central  difference 
scheme  by  evaluating  the  second  derivatives  with  repeated  evaluation  of  the 
first  derivative,  first  at  half  points  and  then  at  the  nodes  of  the  finite  differ¬ 
ence  mesh.  High  order  accurate  discretization  of  the  viscous  fluxes  requires 
large  computational  time.  It  was  found  that  for  a  fourth  order  accurate  ex¬ 
plicit  discretization,  significant  portion  of  computational  time  per  time  step 
is  spend  for  the  evaluation  of  the  viscous  fluxes  and  that  higher  order  explicit 
or  compact  discretizations  are  prohibitively  expensive. 

Implicit,  second  order  accurate  in  space  and  time,  time  stepping  methods 
exist  as  option  in  the  code.  However,  it  was  found  that  these  implicit  schemes 
are  not  as  accurate  for  time  integration  as  explicit  schemes,  and  in  addition, 
they  become  very  diffusive  and  not  appropriate  for  LES  at  the  presence  of 
strong  shocks.  Therefore  all  flows  were  computed  with  the  classical  tree 
stage,  third  order  accurate  TVD  Runge-Kutta  method  of  Osher  and  Shu. 
For  LES  computations  the  Smagorinsky  sub-grid  scale  model  is  used.  The 

1 


2 


particular  implementation  of  the  sub-grid  model  and  the  computed  results 
are  given  in  the  following  sections.  Finally,  the  results  and  the  conclusions 
of  this  study  are  presented. 


2  Numerical  Implementation  for  Large  Eddy 
Simulations 

The  Navier-Stokes  solver  is  based  on  WENO  discretization  for  the  inviscid 
fluxes  and  central  finite  difference  discretization  of  the  viscous  fluxes.  It  is 
parallelized  and  handles  only  block  structured  grids  for  the  numerical  solution 
of  the  Farve- averaged  continuity  momentum  and  energy  equations. 
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where  E  =  p/(y  —  1)  +  0.5/cmj'Uj  is  the  computable  energy  and  fi:j  is  the 
Farve-filtered  viscous  stress  tensor 
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where  the  molecular  viscosity  is  computed  from  the  filtered  temperature  from 
the  Sutherland  law 
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and  Sij  is  the  Farve-filtered  strain  rate  tensor  given  by 
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On  the  right  hand  side  of  the  energy  equation,  Eq. 
diffusive  terms  exist 

(2.3),  the  additional 
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these  terms  are  given  by 
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Vreman  et  al.  [1],  performed  a  priori  test  using  DNS  data  obtained  from 
the  calculation  of  a  compressible  mixing  layer,  and  concluded  that  these 
nonlinearities  have  negligible  effects  and  it  is  acceptable  to  neglect  them. 
For  high  Mach  number  LES,  is  not  absolutely  clear  that  the  contribution 
of  these  terms  is  still  negligible.  However,  because  a  suitable  sub-grid  scale 
model  is  not  available  for  these  terms  these  terms  are  also  neglected  for  the 
present  LES  for  computation  of  shock  boundary  layer  interaction. 

The  sub-grid-scale  viscous  tensor  ffGS  is  modelled  by 

TjGS  =  | pkSGS5i:j  (2.11) 

in  the  modelled  sub-grid-scale  tensor  rhGS  the  eddy  viscosity  /q  and  kSGS 
depend  on  the  sub-grid  scale  model  that  is  presented  in  the  next  section  and 
in  Eq.  (2.3)  the  sub-grid  heat  flux  gSGS  is  modelled  by 


qSGS  _ 


jM_dT_ 
'  Prf  dx  , 


(2.12) 


3 


4 


2.1  Sub-grid  Model  and  Constants 

The  compressible  version  of  the  Smagorinsky  sub-grid-scale  model  [2]  with 
fixed  filter  width  is  used.  The  constants  fit  and  kSGS  of  this  model  in  Eq. 
(2.11)  are  evaluated  as 


fit  =  CRpA2J§^j  (2.13) 

kSGS  =  C^SijSij  (2.14) 

where  the  constants  values  Cr  =  0.012  and  Cj  =  0.0066  are  used  and  the 
filter  width  A  is  chosen  as  A  =  (Ax  x  Ay  x  Az)^1/3) 

The  Smagorinsky  sub-grid-scale  model  is  not,  however,  accurate  for  wall 
bounded  flows.  The  deficiencies  of  this  model  can  be  improved  when  a  length 
scale  filter  close  to  the  wall  is  introduced.  Therefore  the  modified  version  of 
the  Nicoud  and  Ducros  [3]  model  was  also  implemented.  The  eddy  viscosity 
/it  of  this  model  is  given  by 


(  qd  qd  \3/2 
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where  the  constant  value  is  Cw  =  0.3  and 
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The  sub-grid-scale  viscous  tensor  ?fGS 
Nicoud  and  Ducros  [3]  model  is 


of  compressible  version  of  the 


fgGS  =  im(2. Sv  -  (2.17) 

where  the  constant  value  is  Q  =  45.8. 


3  Results 

The  numerical  algorithm  with  the  sub-grid  scale  model,  which  was  described 
for  completeness  in  the  previous  section,  was  used  to  perform  DNS  and  LES 
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of  shock  boundary  layer  interaction.  Sample  results  of  these  computations 
are  presented  in  this  section.  The  grid  used  for  the  simulations  and  the 
boundary  conditions  are  described  first  and  then  the  computed  results  are 
presented. 

Oblique  shock  laminar  boundary  layer  interactions  for  several  Mach  num¬ 
bers  were  computed  with  DNS  and  no  sub-grid  model  was  used.  The  oblique 
shock  is  generated  using  isentropic  flow  relations.  The  shock  boundary  layer 
interaction  takes  place  in  a  computational  domain  with  length  four  units  and 
hight  one  unit.  The  spanwise  extend  of  the  domain  is  one  unit.  For  all  com¬ 
putations  the  same  grid  density  was  used.  The  characteristics  of  the  mesh 
are  as  follows.  In  the  streamwise  direction  a  uniform  mesh  was  used  with  125 
points  per  unit  length  or  Ax  =  0.008.  Along  the  normal  to  the  wall  direc¬ 
tion  200  intervals  were  used  and  the  spacing  of  the  first  point  away  from  the 
surface  was  5  x  10-4.  This  grid  density  guarantees  almost  100  points  within 
the  boundary  layer.  Along  the  periodic  spanwise  direction  fewer  intervals 
were  used  for  cases  where  we  performed  DNS  because  the  objective  was  to 
capture  only  the  initiation  of  transition.  For  LES  computations,  however, 
the  spacing  along  the  spanwise  was  the  same  as  in  the  streamwise  direction. 
The  oblique  shock  angle  was  set  29°  for  all  Mach  numbers.  For  this  angle  of 
the  oblique  shock,  the  wedge  angle  f3  and  the  inflow  boundary  conditions  at 
the  top  of  the  domain  for  various  Mach  numbers  are  specified  as  follows: 


Moo 

P 

Ptop 

^top 

^top 

Ptop 

2.9 

10.94 

1.69997 

2.61934 

-0.50632 

1.52819 

4.0 

16.80 

2.57560 

4.05234 

-1.22766 

5.90900 

5.0 

19.3 

3.24201 

4.95422 

-1.73494 

9.36656 

Oblique  shock  boundary  layer  interaction  results  in  flow  separation  as  will  be 
shown  later.  At  the  both  the  inflow  and  the  outflow  the  flow  is  supersonic, 
therefore  at  the  inflow  supersonic  free  stream  or  boundary  layer  was  specified 
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and  at  the  outflow  all  quantities  were  extrapolated  from  the  interior.  In  addi¬ 
tion,  at  the  exit  a  sponge  region  of  15  cells  was  used  to  completely  eliminate 
reflections  that  may  result  from  the  subsonic  portion  of  the  boundary  layer. 

Normal  shock  boundary  layer  interaction  was  also  computed  with  LES 
at  Mqo  =  1-1  and  M^  =  1.3.  For  the  lower  Mach  number  case,  shock 
boundary  layer  interaction  without  separation  was  found.  For  the  higher 
Mach  number,  however,  significant  separation  and  ”A”  shock  formation  was 
found  in  accordance  with  the  experiment.  The  computational  box  has  di¬ 
mensions  Lx  =  20  x  Ly  =  10  x  Lz  =  20  and  a  401  x  81  x  151  point  mesh  was 
used.  This  mesh  is  uniform  in  the  streamwise  and  spanwise  directions  and 
stretched  in  the  normal  to  the  wall  direction  with  grid  spacing  for  the  first 
point  1  x  10-4.  The  boundary  conditions  at  the  top  of  the  domain  for  normal 
shock  boundary  layer  interaction  were  specified  from  the  isentropic  normal 
shock  relations.  At  the  inflow,  the  flow  is  supersonic  and  laminar  bound¬ 
ary  with  disturbances  for  the  streamwise  and  normal  velocity  components 
[5]  was  specified.  At  the  outflow,  the  flow  is  subsonic  therefore  nonreflective 
boundary  conditions  were  specified.  In  addition,  at  the  exit  a  sponge  region 
of  15  cells  was  used  to  completely  eliminate  reflections. 

3.1  Oblique  shock  boundary  layer  interaction 

Results  for  oblique  shock  laminar  boundary  layer  interaction  are  shown  for 
Mqo  =  2.9  and  M^  =  5.0.  The  flow  field  structure  of  the  computed  field 
in  the  middle  of  the  computational  domain  is  summarized  in  Fig.  1.  The 
formation  of  the  compression  waves  over  the  boundary  layer  before  and  after 
the  interaction  region  are  evident.  At  the  interaction  region  the  bound¬ 
ary  layer  thickness  increases  significantly.  After  interaction  the  compression 
waves  converge  to  a  reflected  shock.  In  the  interaction  region,  massive  flow 
separation  is  observed  for  all  Mach  number.  Downstream  the  interaction 
region  development  of  disturbances  was  found. 

Several  tests  were  carried  out  to  ensure  that  these  disturbances  are  inter¬ 
dependent  of  the  inflow  boundary  conditions.  Numerical  simulations  were 
carried  out  for  longer  domains.  The  inflow  was  placed  at  Xin  =  —  2  or 
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xin  =  —  1  while  the  oblique  shock  was  still  specified  at  x  =  0.  z  =  1.  Simula¬ 
tions  were  carried  out  for  both  fully  developed  compressible  boundary  layer 
at  Moo  =  2.9  and  developing  boundary  layer  from  the  leading  edge.  In  all 
cases  the  input  parameter,  which  is  the  Reynolds  number  per  unit  length, 
was  adjusted  so  that  at  x  =  1.0  the  Reynolds  is  Re  =  104.  The  contour  plots 
in  Fig.  1  correspond  for  xin  =  —  1. 

The  details  of  oblique  shock  laminar  boundary  layer  interaction  at  = 
2.9  are  shown  in  Fig.  2.  The  formation  of  the  compression  waves  ahead  and 
downstream  from  the  interaction  region  are  indicated  by  the  smooth  turning 
of  the  streamlines.  In  the  interaction  region  significant  flow  separation  occurs 
and  the  boundary  layer  reaches  about  ten  times  the  thickness  before  the 
interaction  region.  The  simulations  shown  in  Figs.  1  and  2  correspond  to 
adiabatic  wall  boundary  condition. 

The  effect  of  wall  temperature  on  the  boundary  layer  thickens  and  flow 
field  structure  is  shown  in  Fig.  3.  The  wall  temperature  from  the  adiabatic 
wall  boundary  condition  was  found  approximately  Tw  =  2.5^.  Three  simu¬ 
lations  with  constant  wall  temperature  were  carried  out;  (1)  for  wall  heated 
at  constant  temperature  Tw  =  3.5 T^;  (2)  for  wall  at  a  temperature  approxi¬ 
mately  equal  to  the  average  adiabatic  wall  temperature  Tw  =  2.5T00]  and  (3) 
for  wall  with  cooling  at  temperature  bellow  the  adiabatic  wall  temperature 
Tw  =  1.5Too.  As  expected  rise  of  the  wall  temperature  increases  both  the 
thickness  of  the  boundary  layer  and  interaction  region  while  cooling  of  the 
wall  results  into  thinner  boundary  layer.  The  structure  of  the  flow  field  is  the 
same  for  constant  wall  temperature  close  to  the  adiabatic  wall  temperature. 

Flow  disturbances  were  monitored  in  time  upstream  and  downstream  of 
the  interaction  region  in  the  region  where  the  boundary  layer  is  fully  attached. 
Self-excited  disturbances  downstream  the  interaction  region  are  shown  in  Fig. 
4.  For  oblique  shock  boundary  layer  interaction  at  =  2.9  the  amplitude 
of  these  disturbances  is  small  and  are  not  visible  on  the  contour  field  plots. 
However  as  we  will  show  later  the  amplitude  of  these  disturbances  becomes 
quite  large  and  they  become  visible  as  the  free  steam  Mach  number  increases. 

The  general  features  of  the  computed  flow  field  in  the  middle  of  the  com- 
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putational  domain  for  oblique  shock  laminar  boundary  layer  interaction  at 
Mqo  =  5-0  are  shown  in  Fig.  5.  The  formation  of  the  compression  waves 
over  the  boundary  layer  before  and  after  the  interaction  region  is  again  ev¬ 
ident.  The  details  of  oblique  shock  laminar  boundary  layer  interaction  at 
Mqo  =  5.0  are  shown  in  Figs.  6  and  7.  The  formation  of  the  compression 
waves  waves  ahead  and  downstream  from  the  interaction  region  are  indicated 
by  the  smooth  turning  of  the  streamlines.  In  the  interaction  region  signifi¬ 
cant  flow  reparation  occurs  and  the  boundary  layer  reaches  many  times  the 
thickness  before  and  after  the  interaction  region.  The  disturbances  down¬ 
stream  of  the  interaction  region  are  visible.  These  disturbances  lead  to  rapid 
transition  into  turbulent  flow. 

3.2  LES  for  normal  shock  boundary  layer  interaction 

Simulations  for  normal  shock  boundary  layer  interaction  were  carried  out 
for  Mqo  =  1-1  and  M^  =  1.3.  For  these  simulations  at  the  inflow  a  com¬ 
pressible  boundary  layer  profile  was  prescribed  at  Ae,5=5000,  10000,  and 
20000.  At  Ae,5=5000,  and  10000  transition  was  found  after  the  interaction 
at  i?e,5=20000  turbulent  flow  developed  before  the  interaction  region.  For 
Mqo  =  1.1  normal  shock  boundary  layer  interaction  the  flow  at  the  interac¬ 
tion  region  was  attached  in  the  mean.  For  M^  =  1.3  normal  shock  boundary 
layer  interaction  the  flow  in  the  interaction  region  was  separated  and  a  A-type 
shock  was  formed. 

The  flow  field  structure  for  normal  shock  boundary  layer  interaction  at 
Mqo  =  1-1  is  shown  with  instantaneous  iso-surfaces  of  density  and  pressure 
in  Figs.  8  and  9,  respectively.  The  flow  field  structure  for  normal  shock 
boundary  layer  interaction  at  A  Ac  =  1-3  is  presented  next.  For  Res=5000 
(see  Fig.  10),  flow  instabilities  are  visible  before  the  interaction  region  and 
transition  occurs  after  the  interaction  region.  Larger  scale  flow  instabilities 
in  the  transitional,  and  turbulent  flow  regimes  are  shown  in  Figs.  11  and  12. 
At  /?,c,5= 10000  (see  Fig.  11)  transition  occurs  in  the  separated  flow  region 
bellow  the  A  shock  where  the  formation  of  hairpin  vortices  is  evident.  At 
i?e,5=10000  (see  Fig.  12)  transition  occurs  at  the  A  shock  boundary  layer 
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interaction  region. 

Comparisons  of  the  average  surface  pressure  distribution  and  the  axial 
velocities  at  several  locations  with  the  experiment  are  shown  in  Figs.  13  and 
14.  The  computed  surface  pressure  distribution  is  in  good  agreement  with 
the  measurements.  The  mean  velocity  distribution  is  also  in  agreement  with 
the  experiment  and  it  appears  that  the  main  flow  features  are  well  captured 
by  the  LES. 

4  Conclusions 

The  high  order  accurate  finite  difference  method  we  developed  is  based  on 
WENO  discretization  of  the  inviscid  fluxes  and  fourth  order  accurate  dis¬ 
cretization  of  the  viscous  terms  with  the  Smagorinsky  eddy  viscosity  model 
for  sub-grid  scales.  This  method  was  fount  effective  and  accurate  for  the 
computation  of  transition  and  turbulence  in  high  speed  compressible  flows. 
Simulations  for  shock  boundary  layer  interaction  showed  good  agreement 
with  experimental  measurements. 
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Figure  1.  Overview  of  the  computed  solution  for  oblique  shock  boundary  layer 
interaction  at  Mx  =  2.9  and  incident  shock  angle  6  =  29° . 
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Figure  2.  Flow  field  detail  for  shock  boundary  layer  interaction  at  MB  =  2.9, 
0  =  29°  and  laminar  flow  Re,  =  1 04  /  unit  length 
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Figure  3.  Effect  of  surface  temperature  on  shock  boundary  layer  interaction  at 
Mx  =  2.9,  6  =  29°  and  laminar  flow  Rei  =  104  /  unit ;  top  heated  wall  Tw  =  3.5Tr ; 
middle  Tw  =  2.57.  (approximately  adiabatic);  and  bottom  wall  cooling  7W  =  1.57. 
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0.1  Governing  aerodynamic  equations 
0.2  Euler  equations 

The  motion  of  a  fluid  without  any  viscous  and  thermal  conduction  effects 
is  described  by  the  system  of  Euler  equations.  The  differential  form  of  this 
system  is: 


£(U) 


dU 

~dt 


+  V-F(U)  =  0, 


(1) 


where  U  is  the  conservative  variable  state  vector: 


(  P\ 

U  =  pU  , 
pv 

\PEj 

and  F(U)  is  the  inviscid  flux  tensor  with  vector  components: 


f 


pu  \ 
pu 2  +  p 
puv 

\(pE+p)uJ 


g 


pv  \ 
puv 

pv 2  +  p 

\(pE+p)vJ 


The  specific  energy  E  is  the  sum  of  the  specific  internal  energy  e  and  the 
kinetic  energy: 


E  =  e+t(n2  +  v2).  (2) 

Furthermore,  with  the  assumption  of  a  calorically  perfect  gas  the  specific 
internal  energy  and  pressure,  are  related  through  the  constitutive  relations: 


e  =  C'VT.  p  =  ( 7-1) 


}E-^(u2 


v2) 


0.3  Discretization  method 

The  governing  equations  of  fluid  motion,  given  in  Eq.  (1),  are  discretized 
with  the  Discontinuous  Galerkin  (DG)  method. 
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0.4  Spatial  Discontinuous  Galerkin  discretiza¬ 
tion 

0.4.1  Discontinuous  weak  formulation  for  the  Euler 
equations 

A  tessellation  of  an  arbitrary  domain  i?,  of  the  flow  field,  into  non-overlapping 
elements  Qe  is  considered: 


and  defines  the  computational  mesh.  The  tessellation  may  be  composed  of 
either  straight  sided  or  curvilinear  general  shaped  elements.  The  physical 
field  over  domain  fl  is  approximated  by  a  C~x  function  U(x,  t),  constructed 
by  a  locally  continuous  approximation  Ue(x,  t)  over  each  element  f2e ,  as 
depicted  in  Fig.  1: 


U(x,t)  =  ^Ue(x,t). 

e 

Considering  a  single  element  l?e,  the  approximation  Ue(x,  t)  substituted  in 
Eq.  (1),  results  in  an  approximation  error,  or  residual ,  of  the  field  over  the 
elemental  region,  equal  to: 

fte(Ue(x,f))  =  £(Ue(x,t)),  (3) 

leading  to  the  residual  over  domain  Cl: 

ft(U(x,t))  =  ^ftc(Ue(x,*)).  (4) 

e 

As  a  finite  element  method,  the  DG  discretization,  minimizes  the  approx¬ 
imation  error  over  the  computational  domain  in  the  weighted  residual  sense. 
That  is,  forming  the  Legendre  inner  product  of  7£(U(x,  t))  with  an  arbitrary 
test  or  weight  function  v3 ,  and  setting  it  equal  to  zero: 

(ft(U(x,  t)),Vj)  =  0, 

leads  to: 


3 


17 


Figure  1:  Local  reconstruction  of  the  field  at  two  adjacent  elements. 


IQ 


<9U 

,—  +VjV-F(V) 


dn  =  o 


in 


5U 

V  n  ~  di  u  u 

3  dt 


/  Vvj  •  F(U)dU  +  /  V  •  VjF(V)<m  =  0. 

In  Jn 


Using  Gauss’  theorem,  the  last  equation  results  in: 

r  <9u 


r  Vj—dn  -  /  Vvj  ■  F(XJ)dn  +  <b  ^(U)  •  ndl  =  0,  (5) 

n  dt  Jn  Jon 


where  n  is  the  normalized  outward  vector  at  the  domain  boundary.  Alterna¬ 
tively,  Eq.  (5)  may  be  written  as: 


E 


gUe 

dt 


df2e  — 


Vvj  ■  F(XJe)df2e  + 


VjF( Ue)  •  n edl 


0, 


with  ne  denoting  the  normalized  outward  vector  at  the  element  boundary. 
However,  the  discontinuous  nature  of  the  solution  U,  permits  to  write  the 
residual  for  every  element  as  a  separate  equation: 


dUe 

dt 


df2e 


Vvj  •  F(\Je)dGe  + 


VjF(\Je)  •  n edl  =  0. 


(6) 
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Eq.  (6)  represents  the  weak  formulation  for  the  system  of  Euler  equations, 
under  the  assumption  of  a  C~l  approximation  of  the  flow  field  over  domain 

n. 

The  local  approximation  of  the  field  is  constructed  by  a  linear  combination 
of  elemental  functions  6|(x): 

U,(x,t)  =  ^U‘(t)!,J(x),  (7) 

k 

where  the  elements  of  the  coefficient  vector  Ug(f)  denote  the  degrees  of  free¬ 
dom  (DOF)  of  the  numerical  solution  at  every  element. 

Substituting  Ue(x,  t)  in  Eq.  (6),  results: 

f  Vjbi^dQe-  [  Vvj  ■  F(Ue)df2e  +  <f  VjF(\Je)-nedl  =  0,  (8) 

J  Qe  Ot  J  Qe 

which  is  a  system  of  4DOF  x  4DOF  equations.  As  a  Galerkin  method 
implies,  the  trial  functions  v3  are  equal  to  the  elemental  functions  6|(x). 

0.4.2  Elemental  operations 

In  order  to  handle  arbitrary  geometries,  it  is  imperative  to  numerically  eval¬ 
uate  all  the  integrals  appearing  in  Eq.  (8): 


Mkj  =  /  byfjdQe, 

(9) 

Joe 

V=  f  \7bl-F(\Je)df2e, 

(10) 

J  i?e 

S=  <f  bekF{  Ve)  ■  n edl, 

(11) 

df2e 


where  in  place  of  the  test  function  v3  the  basis  function  6)  has  been  sub¬ 
stituted.  Application  of  a  Gauss  type  numerical  integration,  requires  the 
definition  of  the  standard  element  region  flst  for  quadrilateral  elements,  de¬ 
picted  in  Fig.  2: 

The  integrals  given  in  Eqs.  (9)  and  (10)  over  the  standard  element  region 
are  equal  to: 
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p 

^ st 

Figure  2:  Standard  quadrilateral  element  rii,n2  G  [—1, 1]. 


Mkj  =  J  J  blbj\J\dnidri2,  (12) 

V  =  y1/1  (13) 

where  J  is  the  Jacobian  matrix  of  the  transformation  from  the  physical  to 
the  computational  space.  For  straight  sided  elements,  the  mapping  from  the 
physical  configuration  of  the  element  fte  to  the  standard  region  Qst,  is  given 
by  the  following  relations  for  quadrilateral  elements: 


(1  -  rii)  (1  -  ra2)  (l  +  ni)(l-n2) 
- ^+Xb—2 - 2 

(1  -  //,  )  (1  |  //•/)  (1  !  // i  )  ( 1  1  n2) 

+  Xd - ~ - o - r  Xc - - - z - 


(14) 


_  (1  —  n\)  (1  —  n2)  (l  +  ni)(l-n2) 

V  ~Va  2  2  +  VB  2  2 

(l-m)(l  +  n2)  (l  +  ni)(l  +  n2) 
+  v°—  —  +  yc—  — 


(15) 
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with  its  inverse: 


J1 


-  dy 

_  dx 

1 

dn  2 

dni 

]jf 

dy 

dx 

dn  2 

dn±  - 

0.4.3  Basis  functions 

The  choice  of  the  basis  functions  affects  the  behavior  and  computational 
efficiency  of  the  DG  discretization.  For  every  element  a  finite  element  space 
X  of  functions  is  defined,  as  the  set  of  orthogonal  polynomials  V\.  in  the 
interval  [—1, 1],  up  to  an  arbitrary  order  N : 


x  =  {Vk\  (Vk,Vj)  =  C(k,j)8kj},  ,0  <  k,j  <  N,  Ce  T. 

For  rectangular  elements,  the  basis  is  constructed  using  the  tensorial  prod¬ 
uct  of  Legendre  polynomials  tpp(n)  according  to  a  specihc  indexing: 

h  =  'ipp(n1)Tpq(n2),  -1  <  nu  n2  <  1  (16) 

with 

k  =  p  +  q(N  +  1),  0<p,q<N. 

If  a  local  approximation  of  order  N  is  applied,  then  the  number  of  bases  in 
X  is  equal  to  (N  +  l)2. 

In  Table  1  the  analytical  expressions  of  the  basis  functions,  over  the  stan¬ 
dard  quadrilateral  region  are  given,  and  plotted  in  Fig.  3. 


0.5  Applications 

0.6  Isentropic  vortex  convection 

In  Fig.  4  it  is  shown  the  computational  mesh  for  the  advection  of  an  isentropic 
vortex.  The  mesh  consists  of  2500  quadrilateral  elements.  A  third  order 
accurate  (P2)  scheme  is  used  over  a  time  interval  of  2  time  units. 

Fig.  5  depicts  the  initial  pressure  profile  and  in  Fig.  6  it  is  shown  the 
pressure  at  the  end  of  the  simulation  time. 
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9  9 
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z,  z 

2  2 

Table  1:  Basis  functions  for  a  second  order  approximation  over  the  standard 
quadrilateral  region. 

0.7  Flow  around  a  cylinder 

A  cylinder  with  a  diameter  of  1  unit  is  considered.  The  mesh  in  its  vicinity 
is  shown  in  Fig.  7  and  it  consists  of  3968  quadrilaterals. 
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(g)  (h)  (i) 


Figure  3:  Basis  functions  over  the  standard  quadrilateral  region:  (a)  (p.  q )  = 
(0,0),  (b)  (p,q)  =  (1,0),  (c)  (p,q)  =  (2,0),  (d)  (p,q)  =  (0,1),  (e)  (p,q)  = 
(1,1),  (f)  (p,q)  =  (2,1),  (g)  (p,q)  =  (0,2),  (h)  (p,q)  =  (1,2),  (i)  (p,q)  = 
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Figure  4:  Computational  domain 
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Figure  5:  Initial  pressure. 
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Figure  6:  Final  pressure  (P2)). 
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Figure  8:  Pressure  contours  around  cylinder. 
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Figure  10:  Density  contours  for  ringleb  flow  (PI). 


16 


30 


Figure  11:  Pressure  contours  for  ringleb  flow  (PI). 
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Figure  12:  Mach  number  contours  for  ringleb  flow  (PI). 
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Figure  13:  Density  contours  for  ringleb  flow  (P2). 
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Figure  14:  Pressure  contours  for  ringleb  flow  (P2). 
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Figure  15:  Mach  number  contours  for  ringleb  flow  (P2). 
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Figure  16:  Convergence  rates  for  ringleb  flow. 
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Discretization  method 

The  governing  equations  of  fluid  motion  are  discretized  with  the  Discontinuous 
Galerkin  (DG)  method.  A  remarkable  advantage  of  this  method  is  that  the  local  support 
of  the  solution  at  the  element  level,  makes  it  suitable  for  flow  simulations  with  discon¬ 
tinuities  with  high-order  of  accuracy,  and  moreover,  mesh  distortion  does  not  affect  the 
quality  of  the  solution  considerably  as  the  order  of  the  scheme  increases.  Furthermore, 
p-adaptation  is  easy  to  implement  permitting  the  increase  resolution  of  the  flow  field  at 
certain  regions. 


1  Spatial  Discontinuous  Galerkin  discretization 

1.1  Discontinuous  weak  formulation  for  the  Euler  equations 

A  tessellation  of  an  arbitrary  domain  of  the  flow  field,  into  elements  Qe  is  considered: 

^  =  |Jt2e, 

e 

and  defines  the  computational  mesh.  The  tessellation  may  be  composed  of  either  straight 
sided  or  curvilinear  general  shaped  elements.  The  physical  field  over  domain  Q  is  ap¬ 
proximated  by  a  C~x  function  U(x,  £),  constructed  by  a  locally  continuous  approximation 
Ue  (x,£)  over  each  element  i?e,  as  depicted  in  Fig.  1: 


u  cm)  = 

e 


Considering  a  single  element  the  approximation  Ue(x,  t)  substituted  in  the  system 
of  Euler  equations  results  in  an  approximation  error,  or  residual ,  of  the  field  over  the 
elemental  region,  equal  to: 


fte(Ue(x,i))  =£(Ue(x,£)), 

leading  to  the  residual  over  domain  El : 

ft(U(x,t))  =  £fte(Ue(x,t)). 

e 


(1) 

(2) 


As  a  finite  element  method,  the  DG  discretization,  minimizes  the  approximation 
error  over  the  computational  domain  in  the  weighted  residual  sense.  That  is,  forming  the 
Legendre  inner  product  of  7£(U(x,  £))  with  an  arbitrary  test  or  weight  function  v,  and 
setting  it  equal  to  zero: 
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Figure  1:  Local  reconstruction  of  the  field  at  two  adjacent  elements. 


(72.(U(x,  t)),v)  =  0, 


leads  to: 


/ 

JQ 


dJJ 

v— — h  vV  •  F(U) 
at 


dfi  =  0 


[  v^-dQ-  [  Vv-F(XJ)dI2+  [  V  •  vF(\l)dfl 
Jn  ot  J  q  Jq 

Using  Gauss 7  theorem,  the  last  equation  results  in: 


=  0. 


[  v^-dQ  -  [  Vv  F(XJ)df2  +  <f  vF( U)  •  ndl  =  0, 
JQ  Ot  Jn  JdQ 


(3) 


where  n  is  the  normalized  outward  vector  at  the  domain  boundary.  Alternatively,  Eq.  (3) 
may  be  written  as: 


Vu  •  F(\Je)dOe  + 


vF( Ue)  •  n e8l 


=  0, 
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with  ne  denoting  the  normalized  outward  vector  at  the  element  boundary.  However,  the 
discontinuous  nature  of  the  solution  U,  permits  to  write  the  residual  for  every  element  as 
a  separate  equation: 

[  -  f  Vu  •  F(ije)(We  +  I  vF( Ue)  •  n edl  =  0.  (4) 

Jne  vt  Jne  Jdne 

Eq.  (4)  represents  the  weak  formulation  for  the  system  of  Euler  equations,  under  the 
assumption  of  a  C-1  approximation  of  the  flow  field  over  domain  i2. 

The  local  approximation  of  the  field  is  constructed  by  a  linear  combination  of  ele¬ 
mental  basis  functions  6|(x): 


Ue(x,  t)  =  U e  (*)&jfe(x),  (5) 

k 

where  the  coefficient  vector  U *?(£)  denotes  the  degrees  of  freedom  (DOF)  of  the  numerical 
solution  at  every  element. 

Substituting  Ue(x,  t)  in  Eq.  (4),  results: 

[  vbek^Lldne-  [  Wv  ■  F(lJe)dne  +  <f  vF(\Je)-nedl  =  0,  (6) 

JQe  ut  JQe  Jdne 

As  a  Galerkin  method  implies,  the  trial  function  v  is  equal  to  the  elemental  function  6|(x), 
resulting  in  a  system  of  4DOF  x  4DOF  equations  for  the  system  of  Euler  equations  in 
two  dimensions. 

1.2  Basis  functions  and  standard  elemental  configuration 

The  choice  of  the  basis  functions  affects  the  behavior  and  computational  efficiency  of  the 
DG  discretization  [2-4].  When  constructing  a  p-type  basis  it  is  advantageous  to  use  the 
orthogonal  set  of  polynomials  called  Jacobi  polynomials.  Three  types  of  basis  functions 
may  be  constructed:  modal ,  nodal  and  mixed  modal-nodal. 

The  basis  functions  are  defined  over  a  standard  elemental  configuration,  which  per¬ 
mits  their  systematic  construction  and  implementation  of  any  numerical  operation  involv¬ 
ing  them. 

Application  of  a  Gauss  type  numerical  integration,  requires  the  definition  of  a  stan¬ 
dard  elemental  region  spanning  the  domain  [—  1,1]  X  [-1,  1].  In  Fig.  2,  the  transformation 
of  the  physical  quadrilateral  to  its  standard  elemental  configuration  Dqst  is  depicted,  and  in 
Fig.  3,  the  transformation  of  the  physical  triangle  to  its  standard  elemental  configuration 
is  depicted,  along  with  the  Qqst  region,  where  the  basis  functions  are  to  be  constructed 
and  all  the  numerical  operations  are  performed. 
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Figure  2:  Transformation  of  the  physical  quadrilateral  to  the  standard  quadrilateral  ele¬ 
ment  configuration  (ni,ri2  G  [—1,1]). 

For  every  element  a  finite  element  space  X  of  modal  functions  is  defined,  as  the  set 
of  orthogonal  polynomials  Vk  in  the  interval  [—1,1],  up  to  an  arbitrary  order  N: 

X  =  {Vk\(Vk,Vj)  =  C(k,j)6kj},  0  <k,j<N,  Ceft. 

For  quadrilateral  elements,  the  basis  is  constructed  using  the  tensorial  product  of 
Legendre  polynomials  ^p(n)  according  to  a  specific  indexing,  over  Qqst  [2]: 

bk(,n1,n2)  =  tpp(n1)'tpq(n2),  -l<ni,n2<l  (7) 

with: 

k  —  p  +  q(N  +  1),  0  <  p,  q  <  N. 

If  a  local  approximation  of  order  N  is  applied,  then  the  number  of  bases  in  T  is  equal  to 

(7V  +  1)2. 

For  triangular  elements,  the  basis  is  defined  over  and  constructed  using  the 
tensorial  product  of  Jacobi  polynomials  over  the  region  Qqst  (Fig.  3)  [2]: 

Mfi.fc)  =  (8) 

with: 
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Figure  3:  Transformation  of  the  physical  triangle  to  the  standard  triangular  element 
(£i,£2  G  [ —  1 , 1]  with  £i  +  £2  <  1)  and  standard  quadrilateral  element  configuration 
(ni,n2  G  [-1, 1]). 


(l  +  ni)(l-n2) 
=  - 3 - 


£2  =  fl2, 


and  according  to  the  following  indexing  [5]: 


k  =  p  +  (N  +  1  )q  —  0  <  p,  q  <  N  with  p  +  q  <  N. 

For  a  triangular  element  the  number  of  basis  for  a  local  approximation  of  order  TV,  is  equal 

toiN±lM±V' 

In  Tables  1  and  2  the  analytical  expressions  of  the  basis  functions,  over  the  standard 
quadrilateral  and  triangular  region,  for  a  third  order  expansion  are  given,  and  plotted  in 
Figs.  4  and  5,  respectively. 
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PN 

bk 

p,q 

bk 

bk 

(0,0) 

1 

(1,0) 

n\ 

(2,0) 

3ni+l 

2 

(0,1) 

n2 

(1,1) 

n\U2 

(2,1) 

3n\+1n2 

(0,2) 

3ri2+l 

2 

(1,2) 

3ri2+l 
n\  2 

(2,2) 

(3ni+l)  (3ri2+l) 

2  2 

Table  1:  Basis  functions  for  a  third  order  approximation  over  the  standard  quadrilateral 
region. 


p,q 

bk 

p,q  bk 

p,q 

bk 

(0,0) 

1 

Tro) 

(2,0) 

|(m  l)2  +  3m  2(iy2)^ 

(0,1) 

3ri2+l 

2 

(1,1)  m1-”25"2/3 

(0,2) 

|(n2  -  l)2  +  6n2  -  3 

Table  2:  Basis  functions  for  a  third  order  approximation  over  the  standard  triangular 
region. 

1.3  Elemental  operations 

In  order  to  handle  arbitrary  geometries,  it  is  imperative  to  numerically  evaluate  all  the 
integrals  appearing  in  Eq.  (6): 


Mkj  =  /  blVjdQe , 

(9) 

Jne 

V  =  [  \/bek-F(tSe)dne, 

(10) 

J  fie 

s  =  l  b%F{ Ue)  •  n e8l, 

(11) 

dQe 


where  in  place  of  the  test  function  v  the  elemental  basis  function  be-  has  been  substituted. 
In  the  present  work  Gaussian  numerical  integration  is  used. 

For  quadrilateral  elements,  the  integrals  given  in  Eqs.  (9)  and  (10)  are  then  com¬ 
puted  by  the  weighted  summations  (n  =  ni^,n2,m): 


Mkj  =  J  j  bkbj\J\dnidn2  = 

Qi  Q2 

EE  WiWmbk{n)bj{n)  |J(n)|, 

2=1  m=  1 


(12) 
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V  = 


•  X7bk  •  F(Ue)| J\dnidri2  = 


Q  i  Q  2 

J_1(n)  •  V6fc(n)  •  F(Ue(n)|J(n)|, 


(13) 


2=1  m— 1 

and  for  triangular  elements,  the  same  integrals  are  computed  by  the  following  weighted 
summations: 


Mkj 


=  J  J  2bkbj\3\dZ1d&  = 

J  J  bkbj\3\(-  ^-)dn1dn2  = 


Q  i  Q2 

EE  WiWmbk(n)bj  (n)  |  J  (n)  | 

2=1  m=l 


(14) 


V  =  /  /  &  J"1  •  V6fc  •  F(Ue)|JKi^2  = 

J1  J1  J-1  •  Vbk  •  F(Ue)|J|(t^)dmdn2 

Qi  Q2 

E  E  J-1(n)-V6fc(n)-F(Ue(n))|J(n)| 

2=1  m=l 

where  |J|  is  the  Jacobian  of  the  transformation,  J-1  is  the  inverse  Jacobian  matrix  of 
the  transformation  from  the  physical  to  the  computational  space  and  Q 1  and  Q2  is  the 
number  of  quadrature  points  in  the  n\  and  U2  direction  respectively.  Matrix  M  is  called 
the  mass  matrix  and  possesses  a  diagonal  structure  only  for  straight  sided  triangles  and 
for  straight  sided  quadrilaterals  with  parallel  edges. 

The  surface  integral  in  Eq.  (11)  is  computed  by  the  following  weighted  summation, 
both  for  triangles  and  quadrilaterals: 


/ 1  -  ri2,m\ 

v  2 


S  =  y  bkF(XJe)  •ne\le\dn  = 

Qm  _1  _  (16) 

^  ^  iem6^i?(Ue(7i/m))  •  ne|/e|, 

m=  1 
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where  Qm  is  the  number  of  quadrature  points,  \le\  is  the  edge  length,  H  the  numerical 
flux  at  the  element  boundary  and  is  the  normalized  outward  vector  of  the  local  element 
edge  at  the  computational  space. 


1.4  Numerical  flux 


The  numerical  flux  is  chosen  as  the  local  Lax-Friedrichs  flux  due  to  its  low  computational 
cost: 

H(te{nm))  -<  =  \  [(/+  +  Dnx  +  0+  +  g-)ny\  -  \k  (U“  -  U+)  ,  (17) 

where  the  superscripts  (+)  and  (-)  denote  the  elements  sharing  the  same  edge,  k  is  the 
spectral  radius  of  the  system  in  the  direction  normal  to  the  edge  and  is  equal  to: 

k  =  \un  |  +  c, 


and  it  is  computed  using  the  Roe  averaged  variables: 


pu  — 


Vp+-u+  +  sfp~  u 

Vp^+Vp11 


(18) 


Pp+v+  +  sfp~  v 

pv~  +  V~p~  ’ 

r  _  +  \fp~  h 

WWA  ’ 


c=  (7-1  )yh-  ^(pu2  +  pv2), 

with  h  =  E  +  p. 


(19) 


(20) 


(21) 


1.5  Selection  of  Gauss  type  numerical  integration 

Gauss  numerical  integration  is  classified  according  to  the  way  the  end  points  of  the  domain 
of  integration  are  handled  [2].  The  classification  is  given  in  Table  3,  with  the  corresponding 
order  of  accuracy  using  Q  quadrature  points: 

The  GL  type  is  used  both  for  the  volume  and  the  surface  integrations  due  to  its 
minimum  number  of  quadrature  points  needed  for  a  specific  order  of  accuracy.  Moreover, 
in  Eqs.  (12),  (14)  and  (16),  it  is  seen  that  the  product  of  two  polynomials  of  order  N  is 
integrated.  Thus,  the  same  number  of  quadrature  points  is  used  for  volume  and  surface 
integrations  and  equal  to: 


Q\  —  Q‘2  —  Qm  —  N  +  1 


(22) 
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Type 

End  points 

Order  of  accuracy 

Gauss- Legendre  (GL) 

no  end  points 

2Q-  1 

Gauss-Radau  (GR) 

only  one  end  point 

2Q  —  2 

Gauss-Lobatto-Legendre  (GLL) 

both  end  points 

2Q-3 

Table  3:  Types  of  Gauss  numerical  integration. 

1.6  Mapping  between  physical  and  computational  space 

The  computation  of  the  integrals  in  Eqs.  (12),  (14),  (13)  and  (15)  requires  the  evaluation 
of  the  Jacobian  matrix  at  every  quadrature  point.  This  necessitates  the  construction 
of  a  mapping  from  the  physical  to  the  computational  space.  Usually,  according  to  the 
polynomial  degree  Nb  of  the  physical  space  representation,  curved  elements  are  classified 
in  sub-,  iso-  and  super-parametric  elements,  by  comparing  the  order  Nb  with  that  of  the 
order  N  of  the  discretization  scheme.  In  detail: 

Nb  <  N  :  sub-parametric  element 
Nb  —  N  :  iso-parametric  element 
Nb  >  N  :  super-parametric  element 

For  straight  sided  elements,  the  mapping  from  the  physical  configuration  of  the  ele¬ 
ment  to  the  standard  elemental  region,  is  given  by  the  following  relations  for  quadri¬ 
lateral  elements: 


_  (1  -  ni)  (1  -  n2)  (1 +  ni)  (1 -n2) 

X  XA  n  r,  +  XB  n  r> 


(l-ni)(l  +  n2)  (l  +  m)(l  +  n2) 
+  XD - n - o -  +  XC - n - o - : 


(1 -ni)  (1 -n2)  (l  +  ni)(l-n2) 
V  =VA - - o - i-  Vb — ^ - o — 


(1  —  ni)  (1  +  n2)  (l  +  m)(l  +  n2) 
+  Vd - 7, - - I-  VC - - o - : 


and  for  triangular  elements: 


(23) 


(24) 


X  —  XA~ 


(1  — ni)(l  — n2)  r  __  (l  +  ni)(l-n2)  (  1  +  n2 

~2  ’ 


+  %b 


+  xc- 


(25) 
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(l  -  m)  (l 
y  =  va — j - 


n2)  (l  +  ni)(l-n2)  1  +  n2 

- 1-  Ub - ^  7/^ - 


+  yc- 


The  Jacobian  matrix  is  defined  as: 


-  dx 

dx  - 

dni 

dn2 

dy 

dy 

-dn  2 

dn2  - 

(26) 


(27) 


with  its  inverse  equal  to: 


j-1 


i 

j 


dy  _  dx 
dri2  dn  2 

_  dy  dx 

dn±  dn\ 


Numerical  results 


(28) 


The  numerical  method  described  in  the  previous  section  is  validated  for  the  compu¬ 
tation  of  the  flow  over  a  cylinder  at  a  low  Mach  number  (M^  =  0.15)  in  order  to  approach 
the  incompressible  flow  limit.  Solutions  are  computed  with  meshes  of  quatrilateral  and 
triangular  element.  Implementation  of  mixed  type  meshes  is  underway. 


2  Inviscid  flow  over  a  cylinder 

The  analytic  solution  for  inviscid,  incompressible  flow  over  a  circular  cylinder  is  the  well 
known  potential  flow  solution  which  is  symmetric  with  respect  to  the  y  axis.  The  sur¬ 
face  pressure  coefficient  of  the  potential  flow  solution  is  given  by  the  following  analytic 
expression: 


Cp  —  1  —  4  sin2  ( 6 ).  (29) 

The  quadrilateral  element  mesh  used  for  the  numerical  solution  is  shown  in  Fig.  6. 
The  computed  Mach  number  with  PI  polynomial  basis  is  shown  in  Fig.  7.  It  is  clear  that 
an  entropy  layer  is  created  and  causes  ficticious  separation  at  the  back  stagnation  point. 
This  entropy  layer  can  be  significantly  reduced  and  almost  eliminated  when  the  curved- 
element-type  boundary  condition  (CBC)  of  [6]  is  used.  The  improvements  achieved  with 
the  application  of  CBC  improved  boundary  treatment  is  shown  in  Fig.  8.  A  quantititave 
comparson  is  shown  in  Fig.  9.  Clearly  the  improvements  of  the  CBC  treatment  are 
significant . 

The  triangular  element  mesh  used  for  the  numerical  solution  is  shown  in  Fig.  10. 
The  computed  Mach  number  with  PI  polynomial  basis  is  shown  in  Fig.  11.  It  is  clear  again 
that  an  entropy  layer  is  created.  The  increase  of  the  order  of  accuracy  does  not  diminish 
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the  numerical  entropy  layer  and  the  solution  of  Fig.  12  computed  with  P2  polynomial 
bases  shows  the  same  trend.  The  improvements  achieved  with  the  application  of  CBC 
improved  boundary  treatment  is  shown  in  Fig.  13.  A  quantititave  comparsons  for  the 
solutions  computed  with  PI  and  P2  bases  are  shown  in  Figs.  14  and  15,  respectively.  The 
improvements  of  the  CBC  treatment  are  significant  and  the  third  order  accurate  solution 
of  Fig.  15  agrees  with  the  analytic  result  to  plotting  accuracy. 
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Figure  4:  Basis  functions  over  the  standard  quadrilateral  region:  (a)  (p,q)  =  (0,0),  (b) 
(p,q)  =  (1,0),  (c)  (p,q)  =  (2,0),  (d)  (p,q)  =  (0,1),  (e)  (p,q)  =  (1,1),  (f)  (p,q)  =  (2,1), 
(g)  (p,q)  =  (0,2),  (h)  (p,q)  =  (1,2),  (i)  (p,q)  =  (2,2). 
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(a) 


(b) 


(c) 


(d)  (e)  (f) 

Figure  5:  Basis  functions  over  the  standard  triangular  region:  (a)  (p,  q)  =  (0,0),  (b) 
( P,Q )  =  (1,0),  (c)  (p,q)  =  (2,0),  (d)  (p,q)  =  (0,1),  (e)  (p,q)  =  (1,1),  (f)  (p,q)  =  (0,2). 
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Figure  6:  Structured  mesh  around  a  cylinder. 
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Figure  7:  Mach  contours  with  no  CBC  (PI)  for  the  structured  mesh. 
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Figure  8:  Mach  contours  with  CBC  (PI)  for  the  structured  mesh. 
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Figure  9:  Line  plots  of  computed  and  analytical  pressure  coefficient  for  the  structured 
mesh. 
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Figure  11:  Mach  contours  with  no  CBC  (PI)  for  the  unstructured  mesh. 
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Figure  12:  Mach  contours  with  no  CBC  (P2)  for  the  unstructured  mesh. 
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Figure  13:  Mach  contours  with  CBC  (P2)  for  the  unstructured  mesh. 
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Figure  15:  Line  plots  of  computed  and  analytical  pressure  coefficient  for  the  unstructured 
mesh  (P2). 
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Summary 

The  discontinuous  Galerkin  method  for  the  two-dimensional  Euler  equations  was 
extended  to  mixed-type  (triangular-quadrilateral)  meshes.  Numerical  experiments 
demonstrated  that  the  design  accuracy  level  is  retained  for  both  type  of  meshes  separately 
and  for  the  global  solution  with  mixed-type  meshes.  Furthermore,  the  capability  of 
numerical  solutions  with  p-type  adaptivity  on  fixed  meshes  was  implemented.  Numerical 
results  demonstrate  that  significant  savings  in  computing  time  can  be  obtained  by  using 
mixed-type  meshes  and  p-adapted  numerical  solutions.  These  savings  are  expected  to  be 
even  larger  for  viscous  flows. 

Mixed-type  mesh  and  p-adaptivity 

The  numerical  method,  which  was  described  in  detail  in  the  previous  quarterly 
report,  it  has  the  following  features: 

1 .  It  is  implemented  in  the  computational  space. 

2.  Uses  hierarchical  tensor  product  Jacobi  polynomial  basis  functions. 

3.  Performs  numerical  integration  of  the  volume  and  line  integrals  that  appear  in  the 
DG  methods  on  the  unit  square. 

4.  The  basic  element  in  the  transformed  space  is  the  unit  square  and  triangular 
elements  are  represented  via  the  collapsed  coordinate  system. 

These  features  make  the  method  general,  easy  to  extend  to  three  dimensions,  enable 
discretization  on  mixed-type  meshes  (because  all  operations  for  both  triangles  and 
quadrilaterals  of  the  physical  space  are  performed  for  the  same  unit  square),  and  use  of 
arbitrary  p-adapted  numerical  solutions  because  the  tensor  product  basis  functions  on  the 
unit  square  (for  both  triangles  and  quadrilaterals  of  the  physical  space)  are  hierarchical. 
All  these  features  were  exploited  to  solve  test  problems  on  mixed-type  meshes  and 
demonstrate  the  computational  saving  of  p-adapted  solutions.  In  this  report,  p-adapted 
solutions  are  demonstrated  only  for  artificially  segregated  meshes.  For  example  near  to 
the  cylinder  region  of  mesh  over  a  cylinder  is  flagged  as  a  region  where  the  solution  is 
performed  using  P2  polynomial  bases  (third-order  accurate  solution)  while  in  the  rest  of 
the  domain  the  numerical  solution  is  obtained  using  second  order  accuracy. 

For  the  general  case  it  is  desirable  to  be  able  to  increase  the  polynomial 
approximation  locally  bases  on  criteria  resulting  from  the  physics  of  the  computed 
solution.  In  order  to  establish  procedures  for  p-adaptive  numerical  solutions  on  fixed 
meshes  based  on  the  computed  flow  features  we  the  following  example  is  considered. 
The  following  form  of  the  Euler  equations  is  solved  on  a  triangular  mesh 


P  ' 

0 
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pu 

dt  8g 

)  +  —  +  —  =  { 
8x  8y 
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1  2 

The  source  terms  generate  the  Taylor-Rayleigh  instability  ’  with  the  initial  and 
boundary  conditions  shown  in  Fig.  1 


P=  1,  U  =  L>=  0 
p  =  2.5 


p =i 

Figure  1.  Initial  and  boundary  conditions  for  the  Taylor-Rayleigh  instability 

The  heavy  gas  is  on  the  bottom,  the  normal  velocity  component  is  initially  perturbed,  and 
the  linearly  varying  pressure  is  continuous  at  the  interface.  Numerical  solutions  for  this 
problem  performed  with  WENO  schemes3  or  the  DG  method  for  increasing  order  of 
accuracy  on  fixed  meshes  with  size  h  =  1/240, 1/480, 1/960,  1/1920  or 

h  =  h  ,  h  /  2,  h  /  4,  h  /  8  have  demonstrated  that  with  the  increase  of  resolution  (h 

refinement  or  p  increase)  results  in  capturing  of  finer  and  finer  flow  features.  We  attempt 
to  resolve  fine  feature  with  local  p-type  refinement  only  at  the  regions  of  steep  flow 
gradients  on  fixed  h0  size  meshes.  Currently  steep  density  gradients  are  detected  and  the 

polynomial  approximation  in  these  regions  is  increased,  for  example  P8  approximations 
of  the  approximate  solution  are  used  for  the  maximum  of  the  density  gradient.  At 
neighbouring  to  P8  cells  P7  approximation  is  used  and  the  order  of  approximation 
progressively  diminishes  while  we  move  away  from  regions  with  large  density  gradients. 
The  order  of  approximation  increases  or  drops  according  to  the  magnitude  of  density 
gradient  as  the  flow  develops.  The  full  locally  p-adaptive  scheme  and  the  computed 
solutions  will  be  shown  in  the  next  report.  Similar  approach  will  be  used  for  the  accurate 
and  efficient  computation  of  viscous  flow  features  and  flows  with  shocks, 
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Results 


The  p-adaptive  capabilities  on  mixed-type  meshes  are  demonstrated  for  the 
computation  of  inviscid  flows  over  a  cylinder  and  a  NACA-0015  airfoil.  The  mixed-type 
mesh  used  for  the  computation  of  the  flow  over  the  cylinder  is  shown  in  Fig.  2.  The  near 
field  is  computed  on  a  quadrilateral  mesh  and  the  far  field  is  computed  on  a  triangular 
mesh.  On  the  quadrilateral  mesh  any  type  of  approximation  can  be  used.  In  this  example, 
the  order  of  accuracy  in  both  the  quadrilateral  and  triangular  meshes  was  fixed.  An 
example  where  the  inner  solution  was  computed  with  P2  basis  functions  (third  order 
accurate)  and  the  outer  solution  on  the  triangular  mesh  is  second-order  accurate  (PI 
polynomials)  is  shown  in  Fig.  3.  The  computed  pressure  coefficient  at  My  =0.15  is 

compared  with  the  exact  value  for  incompressible  inviscid  flow  over  a  cylinder  in  Fig.  4. 
For  this  computation  the  boundary  of  the  cylinder  was  approximated  with  linear  elements 
and  the  boundary  treatment  proposed  by  Krivodonova  and  Berger4  was  used,  which  yield 
effectively  quadratic  approximation  of  the  boundary.  Incorporation  of  solid  the  boundary 
representation  of  Ref.  4  significantly  increases  the  accuracy  of  the  numerical  solution  and 
almost  eliminates  artificial  numerical  entropy  layers  that  are  generated  from  low  order 
representation  of  the  curved  boundary. 

Next,  the  inviscid  flow  over  a  NACA-0012  at  low  incidence  a  =  2  deg.  and 
M_fj  =  0.4  was  computed  on  a  mixed  type-mesh.  The  numerical  mesh  is  shown  in  Fig.  5. 

The  computed  pressure  and  Mach  number  contours  are  shown  in  Fig.  6  and  7, 
respectively.  The  gain  of  using  mixed-type  meshes  and  p-adaptivity  is  more  pronounced 
for  computation  of  viscous  flows.  Currently,  implementation  of  viscous  terms  for  the  DG 
method  is  underway.  The  narrow  stencil  implementation  of  viscous  terms  for  the  DG 
method  and  results  from  computations  of  viscous  laminar  flows  over  the  cylinder  and  the 
airfoil  will  be  presented  in  the  next  quarterly  report. 
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Figure  2.  Mixed-type  mesh  for  the  computation  of  inviscid  flow  over  the  cylinder. 
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Figure  4.  Comparison  with  the  exact  solution. 


Figure  5.  Mixed-type  mesh  for  the  computation  of  inviscid  flow  over  the  NACA- 
0012  section. 
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Figure  6. 


Figure  7. 


Computed  pressure  distribution  at  Mx  =  0.4 . 


Computed  Mach  number  distribution. 
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Summary 

Current  trends  on  high  performance  computing  are  moving  towards  the  deploy¬ 
ment  of  several  cores  on  the  same  chip  of  modern  processors  in  order  to  achieve 
substantial  execution  speedup  through  the  extraction  of  the  potential  fine-grain 
parallelism  of  applications.  At  the  forefront  of  this  trend  we  find  nowadays 
the  modern  Graphics  Processors  Units  (GPUs),  which  due  to  their  simplistic 
design  are  able  to  encompass  hundreds  of  independent  processing  units  on  a 
single  chip  in  contrast  to  their  respective  CPUs,  which  at  the  moment  include 
only  a  few  cores  on  the  same  chip.  In  order  to  study  the  potential  speedup  of 
computationally  intensive  applications  that  utilize  the  many-core  architecture 
of  GPUs,  this  report  presents  a  highly  accelerated  implementation  of  the  finite- 
difference  weighted  essentially  non-oscillatory  (WENO)  scheme  and  the  discon¬ 
tinuous  Galerkin  finite  element  method.  These  discretization  techniques  are 
suitable  for  direct  numerical  simulations  (DNS)  large  eddy  simulations  (LES) 
of  compressible  turbulence,  however,  they  require  large  computing  resources 
in  order  to  achieve  high  Reynolds  numbers.  Our  implementation  targets  on 
large-scale  simulations  using  the  CUDA  parallel  programming  and  constitues  a 
paradigm  of  GPU’s  applications  in  CFD.  The  results  of  the  current  implemen¬ 
tation  demonstrate  that  such  a  computationally  intensive  application  could  be 
highly  accelerated  running  on  the  NVIDIA  Tesla  C1070  many-core  GPU. 

1  Background 

Numerical  simulation  of  transition  and  turbulence  in  high-speed  flows  is  daunt¬ 
ing  because  of  the  difficulty  in  ensuring  high-resolution  and  fidelity  in  capturing 
small  disturbances  in  an  environment  containing  sharp  gradients  associated  with 
shocks  and  relatively  thin  boundary  layers.  Even  with  use  of  higher-order  ap¬ 
proaches  many  shock  capturing  methods  introduce  spurious  (numerical)  noise 
which  contaminates  the  solution  beyond  acceptable  limits,  which  can  lead  to 
significant  damping  of  turbulence  fluctuations,  and  masks  the  effects  of  the 
subgrid-scale  (SGS)  models.  Specification  of  boundary  conditions  ensuring  that 
the  numerical  discretizations  remain  stable  is  also  a  critical  issue.  Robust,  high- 
fidelity  and  accuracy  methodologies  that  are  capable  of  treating  complex  flows 
and  are  applicable  for  high-resolution  numerical  solutions  in  complex  domains 
are  therefore  solemnly  required. 
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The  objective  of  the  present  work  is  to  apply  conservative,  high-order  accu¬ 
rate,  shock-capturing  methods  that  are  suitable  for  the  simulation  of  super¬ 
sonic  flows  in  domains  with  moderate  complexity  for  massively  parallel  ar¬ 
chitectures.  A  high  order  finite  difference  weighted  essentially  nonoscillatory 
(WENO)  scheme  and  the  DG  method  are  used  for  the  numerical  solution  of 
the  governing  equations.  The  main  disadvange  of  WENO  discretization  is  that 
with  the  increase  of  the  order  of  accuracy  increases  the  width  of  the  computa¬ 
tional  stencil.  As  a  result,  parallelization  with  traditional  domain  decomposition 
methods  could  become  inefficient.  On  the  other  hand  the  in  general  inherently 
unstructured  data  access  of  the  DG  method  requires  use  of  complex  domain 
decomposition  procedures.  In  this  work  we  only  use  structured  data  access  and 
demonstrate  acceleration  with  the  use  of  graphics  processors  units  (GPUs). 

The  implementation  of  the  accelerated  finite  difference  WENO  scheme  that 
we  present  here  is  based  on  the  CUDA  parallel  programming  model  [1] .  CUDA 
comprises  a  programming  environment  that  provides  an  extention  to  the  C 
programming  language  accompanied  by  the  necessary  libraries  to  support  the 
execution  of  code  on  top  of  the  NVIDIA  GPUs.  Using  CUDA  the  program 
must  be  structured  on  distinct  portions  which  are  called  kernels  and  are  des¬ 
tined  to  be  executed  on  the  side  of  the  GPU.  The  execution  of  kernels  follows  the 
programming  paradigm  of  SIMT  (Single  Instruction  Multiple  Threads)  which 
resembles  the  traditional  programming  model  of  vector  processors,  the  SIMD 
(Single  Instruction  Multiple  Data)  paradigm.  Nevertheless  SIMT  is  less  restric¬ 
tive  than  SIMD  and  allows  programmers  to  write  either  data-parallel  code  as 
in  the  case  of  SIMD  or  arbitrary  thread-based  parallel  code.  In  order  to  initiate 
a  kernel  execution  using  CUDA  a  mapping  of  application  threads  into  blocks 
and  accordingly  a  mapping  of  these  blocks  on  a  grid  must  be  provided  by  the 
programmer. 

Data  parallel  applications  are  serious  candidates  among  the  applications  that 
have  the  potential  to  achieve  a  significant  acceleration  through  the  utilization  of 
many-core  GPUs.  Similar  efforts  have  recently  taken  place  in  the  area  of  scien¬ 
tific  and  recent  demonstrations  in  CFD  concern  simulations  on  both  structured 
[2]  and  unstructured  [3],  [4]  meshes.  The  work  that  we  present  here  proves  that 
CFD  applications  with  similar  characteristics  can  benefit  from  heterogeneous 
computing  systems  that  involve  CPU  and  GPU  co-processing. 

2  Acceleration  on  Graphics  Processors 

In  thus  section  the  procedure  followed  to  reach  the  code  that  was  able  to  par¬ 
allelize  on  GPUs  is  described.  The  steps  comprising  this  procedure  starting 
from  the  FORTRAN  version  to  the  final  CUDA  version  are  given  in  detail.  Our 
approach  for  GPUs  parallelization  is  still  based  on  traditional  domain  decom¬ 
position  techniques  therefore  domain  decomposition  is  described  first. 
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2.1  Domain  decomposition 

Similar  to  other  computational  techniques  used  in  the  CFD  community,  the  high 
order  WENO  scheme  for  the  numerical  solution  of  the  Euler  and  Navier-Stokes 
equations,  has  been  initially  implemented  in  FORTRAN-90  for  single  block 
structured  meshes.  14  Parallelization  of  the  code  for  clusters  was  subsequently 
achieved  through  the  use  of  the  message  passing  protocol  MPI  and  domain 
decomposition.  On  the  MPI  parallel  version,  the  domain  decomposition  was 
performed  by  subdividing  the  domain  only  along  the  axial  direction.  In  order 
to  study  the  potential  acceleration  of  the  code  on  heterogeneous  schemes  that 
involve  CPU  and  GPU  co-processing  we  decided  to  port  our  application  on  the 
C  language  although  the  first  parallelizing  compiler  for  CUDA  and  Fortran  have 
been  released  by  the  time  that  our  research  was  taking  place  [5].  According  to 
that  decision,  the  FORTRAN-90  parallelized  version  of  the  code  formed  the 
basis  for  the  shared  memory  parallel  implementation  on  C/OpenMP  that  we 
evaluate  in  this  paper.  The  decomposition  of  the  computational  domain  for 
OpenMP  multi-threaded  execution  is  depicted  in  the  schematic  of  Fig.  1.  The 
generalized  coordinates  form  of  the  governing  equations  is  solved  and  the  global 
computation  of  dimension  Imax  x  Jmax  x  Kmax  is  subdivided  along  the  i  ,  or 
direction,  which  is  often  the  streamwise  direction,  in  N  subdomains  of  dimension 
{Imax/N}-\-2m  x  Jmax  x  Kmax,  where  m  is  the  number  of  the  ghost  points 
required  for  data  transfer.  The  number  of  ghost  points  varies  with  the  order  of 
the  base  scheme  and  for  the  5th  order  WENO  is  m=S,  while  for  the  9th  order 
WENO  is  m=5. 

The  C/OpenMP  version  of  the  WENO  solver  was  validated  by  comparing 
results  of  the  baseline  FORTRAN-90  code  for  several  cases  for  both  single  pro¬ 
cessor  solutions  and  solutions  with  domain  decomposition.  Next  parallel  imple¬ 
mentation  on  GPUs  was  implemented  and  the  C  language  version  was  ported 
into  CUDA. 

2.2  GPU  processing 

In  the  presence  of  considerably  larger  number  of  cores  on  the  GPU  as  opposed 
to  the  CPU,  in  order  to  fully  utilize  the  afforded  computational  resources,  we 
extended  the  domain  decomposition  along  a  second  direction.  In  that  sense 
every  element  on  the  2D  plane  that  is  mapped  on  the  GPU  is  assigned  on  a 
different  thread  and  corresponds  to  the  processing  of  a  group  of  points  along 
the  third  axis.  This  strategy  of  a  2D  domain  decomposition  that  is  depicted  in 
Fig.  2  and  is  exercised  both  on  the  single  GPU  mode  and  the  multi-GPU  mode 
adds  additional  capabilities  for  large  scale  DNS  or  LES  simulations  where  large 
number  of  points  is  used  along  the  streamwise  and  normal  to  the  wall  directions 
while  the  number  of  points  in  the  spanwise  direction,  which  is  often  considered 
periodic,  is  smaller.  Moreover,  to  maximize  the  throughput  with  appropriate 
expression  of  parallelism,  the  number  of  threads  on  each  block  that  handles  the 
respective  points  has  been  set  as  a  multiple  of  32,  which  corresponds  to  the 
warp  size  of  the  GPU.  This  was  decided  because  the  warp  is  the  minimum  unit 
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that  can  be  scheduled  at  once  on  a  single  stream  multiprocessor  of  the  GPU. 

Along  with  the  need  to  express  a  high  degree  of  parallelism  on  our  scheme, 
the  parameter  that  we  considered  and  has  a  great  impact  on  performance  is  the 
memory  utilization,  especially  in  the  case  of  multi-GPU  execution.  For  that 
reason,  firstly,  we  had  to  minimize  data  transfers  between  the  host  memory 
and  the  device  memory.  Therefore,  in  the  presented  implementation  all  the 
computations  that  are  required  by  the  Runge  Kutta  time  stepping  for  updating 
the  numerical  solution  have  been  assigned  to  the  GPU,  even  though  some  of  the 
computations  do  not  exhibit  a  high  degree  of  parallelism. 

Subsequently  on  the  side  of  each  GPU,  in  order  to  utilize  appropriately 
the  memory  hierarchy  of  the  device,  we  use  the  texture  memory  space  for  the 
placement  of  read-only  data  that  are  computed  before  the  time  evolution  of  the 
simulation  begins.  This  happens  in  order  to  benefit  from  the  caching  mechanism 
of  the  texture  memory,  which  is  by  design  optimized  for  spatial  locality.  The 
results  that  are  computed  on  each  time  step  are  stored  in  global  -  read/write 
-  memory.  In  addition,  due  to  the  specification  of  CUDA  architecture  that  for¬ 
bids  communication  between  threads  that  reside  on  distinct  blocks,  each  block 
computes  it’s  ghost  points  instead  of  exchanging  them  with  neighboring  blocks. 

Concerning  the  necessary  synchronization  that  is  imposed  by  the  computa¬ 
tion  scheme,  it  is  restricted  on  barrier  synchronization  between  the  threads  of 
the  same  common  block  inside  each  CUDA  kernel.  In  that  way  we  signify  the 
update  of  auxiliary  local  variables  required  for  Roe’s  averaging  and  other  local 
arrays  for  the  right  and  the  left  eigenvectors  that  are  evaluated  at  the  average 
state.  The  need  for  mutual  exclusion  is  minimized  on  the  atomic  max  opera¬ 
tion  that  is  used  in  order  to  compute  the  maximum  eigenvalue  required  for  the 
construction  of  the  Lax-Friedrisch  numerical  flux. 


3  Results 

In  this  section  the  cases  chosen  for  the  validation  of  the  parallelization  procedure 
and  evaluation  of  performance  are  presented.  Next  the  performance  is  evalu¬ 
ated  for  both  inviscid  and  viscous  flow  computations.  Two  examples  were  used 
to  evaluate  the  performance  of  the  current  GPU  implementation  of  the  high 
order  numerical  method.  Oblique  inviscid  shock  reflection,  and  simulation  of 
Rayleigh-Taylor  instability.  [6],  [7]  The  code  is  three  dimensional  and  numerical 
simulations  for  two  dimensional  cases  were  performed  by  using  appropriate  to 
the  scheme  order  number  of  planes  in  third  dimension.  Oblique  shock  reflection 
from  a  solid  surface  is  a  classical  compressible  code  validation  case  while  the 
interaction  of  an  oblique  shock  with  a  boundary  layer  is  a  problem  of  current 
interest  where  LES  simulations  could  shed  light  to  the  dynamics  of  the  interac¬ 
tion.  The  GPUs  parallelized  high  order  method  was  tested  for  this  problem  first. 
Numerical  solutions  for  a  sequence  of  meshes  of  different  sizes  were  obtained  in 
order  to  evaluate  the  performance  on  GPUs. 

The  computational  domain  for  the  simulations  of  the  Rayleigh-Taylor  in¬ 
stability  is  the  box  (1  x  0.25)  in  two  dimensions  and  (1  x  0.25  x  0.25)  in 
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three  dimensions.  The  “heavy”  fluid  is  on  the  left  site  and  has  density  pL  =  2 
while  the  “light”  fluid  on  the  right  has  density  pR  =  1.  The  interface  is  be¬ 
tween  the  two  fluids  is  at  x  =  1/2  and  the  variation  of  the  initial  pressure 
is  linear  throughout  the  domain.  The  initial  pressure  in  the  domain  of  the 
heavy  fluid  on  the  left  is  Pl{%)  =  1  +  2x,  while  the  variation  of  the  pressure 
on  the  right  is  pr(x)  =  1.5  +  x.  The  initial  perturbation  of  axial  velocity 
u(y)  =  —0.025  cos(87n/)  is  specified  throughout  the  computational  domain  and 
the  source  term,  16,  17  S  =  (0,  p,  0,  0,  pu)  T,  is  added  for  both  viscous  and 
inviscid  simulations. 

The  significant  reductions  of  the  computational  time  achieved  through  the 
use  of  GPUs,  which  are  discussed  in  detail  in  the  next  section,  made  possible 
simulations  of  the  Rayleigh-Taylor  instability  with  different  mesh  densities  and 
schemes  of  different  order  of  accuracy.  A  comparisons  of  two  dimensional  simu¬ 
lations  obtained  on  a  series  of  meshes  is  shown  in  Fig.  3.  All  simulations  of  Fig. 
3  are  for  the  same  final  time.  Clearly  increase  of  the  order  of  accuracy  yields  the 
same  effect  as  doubling  of  the  mesh  density  in  both  directions.  Furthermore, 
use  of  GPUs  made  possible  three  dimensional  simulations.  An  example  of  a 
three  dimensional  simulation  obtained  on  a  relatively  coarse  240  x  60  x  60  point 
mesh  is  shown  in  Figs.  4  and  5. 

3.1  Performance  evaluation  on  Graphics  Processors 

In  order  to  evaluate  the  performance  of  our  implementation  we  have  conducted 
experiments  using  a  variety  of  combinations  among  simulation  input  settings 
and  experimental  platform  deployment.  The  experimental  platform  consisted 
of  a  host  platform  supplied  with  a  quad-core  Intel  Xeon  X5450  processor  at  the 
clock  rate  of  3.0  GHz  with  4GB  of  main  memory  and  one  NVIDIA  Tesla  SI 070 
1U  computing  system.  The  Tesla  S1070  systeml8  makes  available  4  GPUs 
overall,  with  30  stream  multiprocessors  and  240  cores  on  each  GPU,  resulting 
on  an  aggregate  of  960  cores  for  our  simulations.  Each  GPU  is  equipped  with  4 
GB  of  memory. 

In  the  following  figures  we  present  results  in  terms  of  execution  times  (Fig. 
6)  and  their  respective  speedups  (Fig.  7  and  Fig.  8)  that  were  achieved  in 
comparison  with  the  sequential  run.  The  computations  refer  to  single  precision 
arithmetic  calculations  and  the  presented  results  refer  to  the  average  of  10  sim¬ 
ulation  runs  of  the  Rayleigh-Taylor  instability  were  each  simulation  conducted 
100  iterations.  The  results  for  the  evaluation  of  oblique  inviscid  shock  reflection 
are  similar. 

According  to  the  experimentation  results  the  highest  performance  is  achieved 
when  4  GPUs  are  utilized.  The  succeeded  speedup  in  that  case  is  53  on  the 
average  for  the  several  mesh  sizes.  However  this  specific  result  is  lower  than 
the  optimal  if  we  compare  it  with  the  average  speedup  that  we  achieve  when 
we  utilize  a  single  Tesla  GPU.  This  is  mainly  due  to  the  data  transfers  that 
have  to  take  place  between  the  host  memory  and  the  several  device  memories 
on  each  time  step  of  the  simulation.  Nevertheless  the  speedup  is  still  significant 
compared  to  the  sequential  or  the  OpenMP  executions. 
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4  Conclusions 


A  high  order  WENO  finite  difference  method  was  implemented  for  parallel  pro¬ 
cessing  with  GPUs.  A  two  level  domain  decomposition  approach  was  employed 
in  order  to  achieve  highly  accelerated  processing.  On  the  first  level  the  domain 
was  decomposed  into  subdomains  that  are  assigned  to  multiple  GPUs.  On  the 
second  level  within  each  GPU  the  domain  was  further  decomposed  into  an  ap¬ 
propriate  number  of  thread  blocks.  High  resolution  simulations  of  oblique  shock 
reflection  and  Rayleigh-Taylor  instability  were  used  as  computational  examples 
for  the  evaluation  of  the  parallelization  efficiency.  It  was  found  that  a  signifi¬ 
cant,  however  yet  suboptimal,  speedup  was  achieved  with  the  available  number 
of  GPUs.  It  is  anticipated  that  better  handling  of  data  transfer  among  GPUs 
can  further  increase  processing  speed  to  optimal  levels.  It  is  also  expected  that 
implementation  of  the  existing  parallelization  approach  on  clusters  with  GPUs 
would  make  possible  large  scale  DNS  and  LES  computations. 
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Figure  1.  Schematic  for  domain  decomposition  with  MPI  or  OpenMP 
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Figure  3.  Effect  of  grid  (h-type)  refinement  and  order  of  accuracy  (p  -  type) 
refinement  on  the  resolution  of  the  Rayleigh-Taylor  instability 


Figure  4.  Numerical  simulation  of  Rayleigh-Taylor  instability  using  a 
480  x  120  x  120  point  mesh 
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Figure  5.  Numerical  simulation  of  Rayleigh-Taylor  instability  using  a 

480  x  120  x  120  point  mesh 
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Figure  6.  Execution  times  for  several  grid  sizes 
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Figure  7.  Speedup  of  OpenMP  and  GPU  implementations  compared  to  sequential  execution  I 


Figure  8.  Speedup  of  OpenMP  and  GPU  implementations  compared  to  sequential  execution  II 
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A  unified  limiting  approach  for  DG  discretiza¬ 
tions  on  mixed-type  unstructured  meshes 

Summary 

Accurate  predictions  of  skin  friction  and  thermal  loads  of  high  speed  complex 
flows,  in  both  simple  and  nontrivial  geometries,  require  good  shock  capur- 
ing  capability  and  high  resolution.  High  order  discontinuous  Galerkin  (DG) 
discretizations  possess  features  that  make  them  attractive  for  accurate  com¬ 
putation  of  complex  flows  with  strong  shocks.  A  key  ingredient  that  would 
make  the  DG  method  more  attractive  for  these  computations,  is  application 
of  p-adaptive  procedures  that  ensure  accurate  capturing  of  discontinuities 
with  low  order  expansions  and  resolution  of  smooth  complex  features,  such 
as  vortices  and  shear  layers,  with  higher  order  accuracy.  In  this  work,  a 
limiting  procedure  of  DG  discretizations  is  developed  that  is  capable  of  com¬ 
puting  high  speed  flows  with  strong  shocks  around  complex  geometries,  using 
a  p-adaptive  procedure  on  mixed  type  (quadrilateral  and  triangular)  meshes. 
The  unified  new  approach  for  limiting  high  order  expansion  of  the  approxi¬ 
mate  solution  for  mixed  type  meshes  is  presented,  and  a  number  of  results 
are  shown  to  illustrate  the  potential  of  the  method. 


Introduction 

The  Discontinuous  Galerkin  (DG)  method  has  become  popular  in  recent 
years  due  to  its  ability  to  simulate  flows  around  complex  geometries  and 
because  it  offers  the  advantage  of  increased  order  of  approximation  with  a 
compact  stencil.  As  a  result,  it  yields  high  parallelization,  both  with  domain 
decomposition  and  graphic  processor  units  (GPUs).  However,  an  aspect  of 
the  DG  method  that  is  not  satisfactory  as  yet,  is  the  ability  to  compute  flows 
with  discontinuities  for  arbitrary  mixed  type  meshes  and  three  dimensional 
computations.  Usually,  the  DG  method  reduces  the  solution  accuracy  near 
discontinuities.  This  affects  the  resolution  properties  of  the  method  and  much 
of  its  potential  is  lost.  Therefore,  use  of  p-adaptive  procedures  is  required  to 
enhance  utilization  of  the  method  for  flows  with  discontinuities. 

A  TVB  limiting  procedure  for  the  DG  higher  order  expansions  was  in¬ 
troduced  for  scalar  one  dimensional  hyperbolic  conservation  law  by  Shu  [1] 
and  Cockburn  and  Shu  [2] .  The  extension  of  the  method  to  one-dimensional 
systems  has  been  applied  with  satisfactory  results  [3] .  The  basic  idea  behind 
was  to  use  a  flux  limiter  to  preserve  the  monotonicity  of  the  solution  aver- 
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ages.  The  nonlinear  limiting  was  applied  in  two  dimensional  problems  [4-6], 
for  second  (P1)  and  third  (P2)  order  accurate  computations  and  for  rectan¬ 
gular  quatrilateral  and  triangular  meshes.  Qiu  and  Shu  [7]  applied  the  same 
TVB  limiter  for  the  one  dimensional  Euler  equations,  but  reconstructed  the 
solution  at  every  element  using  a  Hermite  WENO  scheme  using  the  average 
solution  at  every  element,  and  achieving  a  third  order  monotone  solution. 
Similar  approaches  were  followed  by  Zhu  and  Qiu  [8,  9],  and  [10].  These 
limiting  approaches  are  also  restricted  to  meshes  with  rectangular  elements 
and  for  triangular  meshes  with  the  use  of  an  elaborate  approach  that  is  not 
straight  forward  to  extend  in  three  dimensions.  The  limiting  approaches  of 
[7-10],  however,  increase  the  stencil  width  and  one  of  the  main  advantages  of 
the  DG  method  is  lost.  Furthermore,  there  is  an  ambiguity  of  the  procedure 
regarding  the  solution’s  second  derivative,  and  the  case  of  distorted  meshes 
has  not  been  addressed. 

Biswas  [11]  proposed  an  alternative  implementation  of  limiting,  where  the 
solution  moments  are  altered  in  order  to  preserve  monotonicity.  Krivodonova 
[12]  applied  also  the  same  concept  with  promising  results,  in  computations 
for  flows  with  strong  shocks  and  with  order  of  accuracy  higher  than  two. 
But,  the  meshes  employed  for  these  approaches  did  not  have  any  distortion 
and  used  only  rectangular  elements. 

The  proposed  approach  uses  the  basic  TVB  limiter  of  Cockburn  and 
Shu  [2],  and  overcomes  limitations  associated  with  canonical  quadrilateral 
meshes,  it  resolves  the  ambiguity  concerning  the  second  derivative  of  the 
solution  and  works  very  effectively  on  distorted  mixed  type  meshes. 


Unified  local  projection  limiting 

A  spatial  DG  discretization  of  the  Euler  equations  of  gas  dynamics  is  con¬ 
sidered.  A  P 1  approximation  of  the  solution  is  applied,  using  modal  bases, 
which  are  constructed  by  the  tensor  product  of  Legendre  polynomials  [13]  on 
the  standard  square  reference  element  of  Fig.  1.  Arbitrary  quadrilaterals  of 
the  physical  space  (x,  y)  transform  to  the  standard  square  element  in  the  ref¬ 
erence  space  (n,i,n2).  Triangles  in  the  physical  space  (see  Fig.  2)  transform 
to  right  triangles  in  the  (Q ,  <Q)  space  and  then  to  the  standard  square  in  the 
(«i,  n2)  space,  through  the  collapsed  coordinate  transformation  [13].  Details 
of  the  quadrilateral  ans  triangle  transformations  are  shown  in  Figs.  1  and  2, 
respectively. 

The  current  approach  preserves  the  monotonicity  of  the  solution  in  a  TVB 
sense,  by  applying  a  slope  limiting  procedure,  which  is  transparent  to  the 
element  type.  Specifically,  all  the  limiting  operations  are  performed  for  every 
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element  in  the  reference  space  (n\,  n2)  for  the  standard  square  configuration, 
where  it  is  required  that  the  slope  of  the  solution  at  the  element  edges  does 
not  exceed  the  variation  of  the  mean  solution  across  them. 


Figure  1:  Transformation  of  the  physical  quadrilateral  ( x ,  y )  to  the  standard 
rectangular  element  configuration  (ni,n2  G  [—1, 1]). 


In  two  dimensions,  every  element  is  checked  for  limiting  in  each  direction 
ni,  n2  of  the  reference  space  by  using  the  TVB  limiter  proposed  by  Shu  [1]: 


m(ai,a2,  •  •  •  ,an ) 


/ 

«i  if  ai|  <  ML2x  y 

|  m(a1,  a2,  ■  ■ 

■  ,  an)  otherwise, 

(1) 


where  M  >  0  is  a  parameter  that  estimates  the  second  derivative  of  the 
solution,  and  Lx  y  a  characteristic  local  mesh  length  in  the  x  and  y  direction  at 
every  element.  The  function  m(ai,  a2,  •  •  •  ,  an),  is  the  usual  minmod  function: 


?Tl(o(.i  1  '  '  '  1  Q>n) 


s  •  mini<j<n\a,j\  if  sign(ai) 
0  otherwise. 


sign(an)  =  s, 


(2) 


The  limiting  is  performed  in  the  characteristic  space,  by  computing  the  di¬ 
rectional  Jacobian  using  the  Roe  averaged  variables.  An  element  is  “flagged “ 
for  limiting,  if  the  limiter  in  Eq.  (1)  returns  an  argument  other  than  the  first. 

For  the  effective  use  of  the  limiter,  it  is  crucial  to  have  a  good  estimate 
of  the  second  order  derivative  of  the  solution  and  of  the  local  mesh  lengths. 
The  characteristic  local  lengths  are  computed  as  in  [15]: 

Lx  ^  ]  /x,e^j/e,  Ly  —  ^  ]  /y)SAxe,  (3) 

e  e 

where  fx,  fy  are  the  following  grid  functions: 

fx  =  Axec\Axec\,  fy  =  Ayec\Ayec\.  (4) 
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(-1,1) 


Figure  2:  Transformation  of  the  physical  triangle  to  the  standard  triangular 
element  (£i,£2  £  [ — 1, 1]  with  £i+£2  <  1)  and  standard  quadrilateral  element 
configuration  (711,712  €  [—1,1]). 


The  parameter  M  in  our  approach  is  not  chosen  arbitrarily  as  in  Shu  [3], 
where  a  constand  value  of  M=50  is  used,  but  is  set  equal  to  the  absolute 
value  of  the  estimate  of  the  Laplacian  operator  for  the  element  k  where 
limiting  is  applied: 


M 


^(«e  -  Uk) 


(5) 


e 


where  with  u  is  denoted  the  mean  value  of  the  solution,  and  the  index  e 
counts  all  the  elements  sharing  the  same  edges  with  element  k. 


Limiting  Quadrilateral  elements 

The  mapping  of  a  quadrilateral  element  to  the  computational  space  is  shown 
in  Fig.  1  1.  For  a  second  order  accurate  solution,  P 1  approximation,  the 
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following  expansion  of  the  solution  is  used: 

«(x,  t)  =  c0(t)b0(x)  +  ci  (t)bi  (x)  +  c2(f)62(x)  +  c3(t)b3x ,  (6) 

where  6j(x),c*(t),  z  =  0,1,...,  3  are  the  expansion  bases  and  the  coefficients 
of  the  expansion,  respectively.  For  the  standard  square  element,  the  modal 
bases  with  respect  to  the  local  Cartesian  system  (ni,n2),  of  the  reference 
element  are: 


bo  =  1) 
bi  =  n  i, 
b2  =  n2, 
b3  =  nin2 


The  expansion  coefficients  are  the  solution  moments,  and  correspond  to  the 
derivatives  ux,uy  and  uxy ,  respectively.  In  order  to  keep  the  solution  mono¬ 
tone,  as  much  as  possible,  it  is  imperative  to  limit  the  coefficients  correspond¬ 
ing  to  the  Pl  part  of  the  solution.  The  midpoints  of  the  edges  are  chosen  for 
limiting  of  the  coefficients,  for  the  following  reasons: 

•  Avoid  ambiguity  and  closure  problems. 

•  Computational  efficiency 

Along  the  n\  direction  coefficient  Ci  ~  ux  is  limited.  For  doing  so,  the  P 1 
part  of  the  solution  at  the  midpoints  of  the  edges  BC  and  AD  (see  Fig.  3)  is 
used.  According  to  Eq.  (6): 


Ubc  =  u(l,0)  -  c0  =  cu 

(7a) 

UAd  =  «(-l,  0)  -  c0  =  -cu 

(7b) 

and  the  value  for  the  coefficient  Ci  is: 

Ubc  ~  Uad 
Cl  = - 2 - 

Limiting  of  the  solution  along  the  rii  direction  is  performed  by  limiting  the 
values  Ubc  and  Uad  at  the  midpoints  of  the  edges  as  follows: 

Ubc,(i,3)  ’  Q),(i+i ,j)  Q),(i,j) ?  Q),(*p)  i,i))j  (9a) 

U AD,(i,j)  ^n(U AD,(i,j) j  CQ,(i,j+l)  Q),(ip— 1)))  (9b) 


(8) 
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Figure  3:  Stencil  for  limiting  over  the  standard  square  element. 


and  the  new  value  of  c\  is  computed  by  Eq.  (8).  Similar  methodology  is 
followed  for  the  c 2  ~  uy  coefficient. 

If  a  quadrilateral  element  is  flagged  for  limiting,  then  the  coefficient 
c3  ~  uxy  is  set  to  zero,  and  moreover,  the  residual  associated  with  the  cor¬ 
responding  base  is  set  to  zero,  during  a  multi-step  time  marching  method. 
The  rationale  behind  this  is  that,  as  long  as,  the  coefficient  c3  is  set  to  zero, 
it  must  be  kept  zero  along  the  stages  between  time  steps. 

Limiting  Triangular  elements 

In  Fig.  2,  the  mapping  of  a  triangular  element  in  the  reference  computational 
domain  is  given.  Applying  a  P 1  approximation  over  the  elemental  domain, 
the  following  expansion  of  the  solution  is  used: 

u(x,t)  =  co(i)&0(x)  +  ci(t)6i(x)  +  c2(f)62(x).  (10) 

Referring  to  Fig.  2,  the  modal  bases  relative  to  the  Cartesian  system  (ni,  n2) 
of  the  reference  element  are: 


bo  —  1, 

,  1  -  n2 

bl  =  Ul~T~ 

3 77/2  +  1 
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The  coefficients  c\  and  C2  correspond  to  ux  and  uy,  respectively.  The  mid¬ 
points  of  the  edges  are  chosen  for  the  limiting  procedure,  and  a  similar  proce¬ 
dure  as  in  the  quadrilaterals  is  followed.  Specifically,  along  the  rq  direction, 
coefficient  c\  is  altered,  by  limiting  the  Pl  part  of  the  solution.  Then,  from 
the  expansion  given  in  Eq.  (10),  coefficient  c\  is  computed  as: 


ci  —  Ubc  ~  Uad ■  (11) 

Along  the  n2  direction  coefficient  C2  is  limited,  but  as  a  collapsed  Cartesian 
system  is  used  for  representing  the  bases  over  the  standard  square  element, 
the  P 1  part  of  the  solution  across  the  edge  AB  in  Fig.  3  is  limited  only 
with  the  mean  solution  variation  across  that  edge.  The  final  value  of  the  c2 
coefficient  is: 

C2  i  C 0,(i,j)  Q),(i  J— 1) ,  ^0,(i,j)  ^0,(i,j— !))•  (12) 


Results 

A  third  order  explicit  TVD  Runge-Kutta  scheme  [14]  is  used  for  time  march¬ 
ing.  The  new  limiting  procedure  is  applied  for  several  cases  with  triangular 
and  quadrilateral  meshes  with  arbitrary  (not  rectangular)  shape.  Some  ex¬ 
amples  are  shown  next. 


Oblique  shock  reflection 

The  problem  of  an  oblique  shock  at  Mach  3  is  solved.  The  domain  is  a  4x1 
rectangle.  On  the  top,  the  flow  is  specified  with  the  oblique  shock  relations 
for  a  shock  at  M  =  3  and  an  angle  6  =  30°.  At  the  inflow  uniform  supersonic 
flow  is  specified.  On  the  wall  y  =  0  the  slip  velocity  condition  is  applied,  while 
density  and  pressure  are  extrapolated  assuming  zero  normal  gradient.  At  the 
supersonic  outflowm  all  quantities  are  extrapolated  from  the  interior.  A  mesh 
of  200  x  50  triangular  elements  has  been  used.  The  pressure  contours  are 
shown  in  Figs.  4a, 4b.  Good  capturing  of  both  the  incident  and  the  reflected 
shocks  is  achieved.  In  Fig.  5,  the  fine  plot  of  the  normalized  pressure  along 
the  center  of  the  domain  y  =  \  is  shown. 

Supersonic  flow  over  a  cylinder 

Supersonic  flow  at  a  free  stream  Mach  number  M  =  2  over  a  cylinder  is 
considered  next.  Both  structured  and  unstructured  meshes  have  been  used. 
The  meshes  used  in  the  computations  of  the  flow  are  shown  in  Fig  6,  and  they 
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consist  of  2400  and  4800  elements  of  quadrilaterals  and  triangles,  respectively. 
Due  to  the  symmetry  of  the  problem  only  half  of  the  domain  is  considered. 

In  Fig.  7,  the  pressure  contours  are  shown.  The  agreement  between 
the  numerical  solutions  computed  on  different  meshes  appears  satisfactory. 
Computed  pressure  and  velocity  magnitude  along  the  stagnation  line  are 
compared  in  Figs.  8  and  9,  respectively.  Good  quantitative  agreement  be¬ 
tween  the  computed  results  is  obtained. 


Conclusions 

A  new  limiting  procedure  for  DG  discretizations  has  been  developed  and 
applied  for  distorted  meshes  with  arbitrary  quadrilaterals  and  triangular 
meshes.  The  proposed  approach  applies  to  mixed-type  meshes  and  it  is 
straight  forward  to  extend  in  three  dimensions.  Numerical  solutions  for  tri¬ 
angular  and  quadrilateral  meshes  were  presented.  Numerical  solutions  of 
viscous  flows  with  shocks  will  be  computed  using  a  p- adaptive  procedure. 
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Figure  4a:  Pressure  contours  for  the  oblique  shock  problem. 


Figure  4b:  Pressure  contours  for  the  oblique  shock  problem  (Detail). 


Figure  5:  Pressure  plot  at  y  =  \  for  the  oblique  shock  problem. 
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Figure  6:  Meshes  used  of  a  Mach  2  flow  over  a  cylinder:  (a)  Triangles,  (b) 
Quadrilaterals. 


Figure  7:  Pressure  contours  for  a  Mach  2  flow  over  a  cylinder:  (a)Triangles 
,  (b)  Quadrilaterals. 


12 


94 


Velocity  maagnitude  Pressure 
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Figure  8:  Pressure  plot  along  the  stagnation  line. 


Figure  9:  Velocity  magnitude  along  the  stagnation  line. 
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Parallelization  of  the  DG  method  for  the  Navier- 
Stokes  Equations 

Summary 

Accurate  predictions  of  skin  friction  and  thermal  loads  of  high  speed  complex 
flows,  in  both  simple  and  nontrivial  geometries,  require  good  shock  capturing 
capability  and  high  accuracy  in  the  viscous  flow  region.  High  order  discontin¬ 
uous  Galerkin  (DG)  discretizations  possess  features  that  make  them  attrac¬ 
tive  for  accurate  computation  of  complex  flows  with  strong  shocks.  A  key 
ingredient  that  would  make  the  DG  method  more  attractive  for  these  com¬ 
putations,  is  application  of  p-adaptive  procedures  that  ensure  accurate  cap¬ 
turing  of  discontinuities  with  low  order  expansions  and  resolution  of  smooth 
complex  features,  such  as  vortices  and  shear  layers,  with  higher  order  ac¬ 
curacy.  A  limiting  procedure  of  DG  discretizations  capable  of  commuting 
high  speed  flows  with  strong  shocks  around  complex  geometries,  using  a  p- 
adaptive  procedure  on  mixed  type  (quadrilateral  and  triangular)  meshes  was 
developed  and  preliminary  results  were  presented  in  the  previous  report.  In 
this  report  further  validation  of  the  limiting  approach  is  presented  for  stan¬ 
dard  computationally  demanding  flow  cases,  such  as  the  Mach  reflection  and 
the  wind  tunnel  with  a  step.  Application  examples  for  both  quadrilateral, 
triangular  and  mixed  type  meshes  are  shown. 

These  large  scale  computations  were  made  possible  by  parallelizing  the 
code  using  domain  decomposition  and  MPI.  Viscous  terms  were  added  for  the 
DG  method  using  the  local  DG  approach.  Incorporation  of  an  implicit  time 
marching  scheme,  which  will  make  possible  high  Reynolds  number  compu¬ 
tations,  is  underway  for  a  single  processor.  Implementation  of  implicit  time 
marching  with  domain  decomposition  is  more  involved  and  requires  use  of  a 
dual  time  stepping  for  time  accurate  computations.  Developments  of  capabil¬ 
ities  to  compute  viscous  flow  numerical  solutions  with  implicit  time  stepping 
for  multiprocessor  will  be  presented  in  the  next  report. 
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Parallelization 


The  compact  form  of  the  DG  discretization  stencil  (information  is  needed 
only  from  the  immediate  neighbors)  permits  efficient  implementation  on  par¬ 
allel  processors.  The  data  structure  of  the  code  was  designed  in  order  to 
meet  the  requirements  of  the  DG  method  on  mixed  type  element  meshes  and 
p-adaptive  expansions.  Moreover,  it  allows  application  of  the  domain  decom¬ 
position  method  through  MPI  without  serious  additional  coding  complexity. 

Every  simulation  with  domain  decompositions  initiates  with  the  par¬ 
titioning  of  the  computational  domain  using  free  available  software  Metis 
(http:/ /glaros. dtc.umn.edu/gkhome/views/metis).  In  order  to  handle  mixed 
type  meshes,  a  graph  file  representing  the  mesh  is  initially  created  and  subse¬ 
quently  partitioning  of  the  graph  in  to  subdomains  leads  to  the  final  decom¬ 
position  of  the  mesh.  Every  partition  (subdomain)  includes  layers  of  elements 
belonging  to  neighboring  partitions.  This  set  of  elements  is  the  receiving  list 
for  a  partition,  and  the  set  of  elements  of  the  partition  sharing  the  same 
edges  with  the  receiving  list,  is  the  sending  list  to  neighboring  partitions. 

A  TVD  Runge-Kutta  method  was  used  for  time  marching.  Limiting  is 
applied  after  every  stage  of  the  RK  cycle.  Finally,  an  exchange  between 
the  partitions  of  the  solution  is  also  performed  at  the  end  of  each  stage. 
For  a  parallel  p-adaptive  DG  code,  all  coefficients  of  the  expansion  for  the 
highest  order  polynomial  approximation  are  exchanged  between  partitions  of 
the  mesh. 

The  parallel  algorithm  for  every  partition  is  as  follows: 

1 .  Transfer  the  solution  (expansion  coefficients)  at  the  end  of  the  RK  cycle 
between  partitions.  Wait  for  the  transfer  to  complete. 

2.  Apply  the  limiter. 

3.  Transfer  of  all  coefficients  of  the  highest  polynomial  application.  Wait 
for  the  transfer  to  complete. 

4.  Assign  solution  coefficients  from  PO  up  to  the  highest  polynomial  ap¬ 
plication. 

5.  Compute  the  volume  integrals. 

6.  Compute  the  line  integrals. 

7.  Compute  residual  for  every  element  belonging  to  the  partition. 

8.  Enter  the  next  RK  stage. 

9.  Complete  the  RK  cycle  and  repeat. 
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Results 


Results  from  parallel  computations  of  flows  with  strong  shocks  on  single-  and 
mixed- type  meshes  are  presented  next.  The  supersonic  flow  over  a  cylinder 
is  the  first  example.  The  single  type  computational  mesh  with  triangular 
elements  and  the  computed  density  field  for  flow  at  M=2.0  is  are  shown  in 
Figs.  1-2.  The  same  flow  was  computed  on  a  mesh  with  arbitrary  quadri¬ 
lateral  elements  (see  Figs.  3  and  4).  This  computation  was  not  possible 
with  the  original  limiting  procedure  suggested  by  Cockburn  and  Shu  [1]  and 
difficult  to  implement  with  limiting  approaches  presented  in  Ref.  2-6,  but 
it  is  straightforward  with  our  new  limiting  approach.  Next,  an  example  of 
a  computation  on  a  mixed-type  mesh  is  presented.  The  mesh  and  the  com¬ 
puted  density  field  is  shown  in  Fig.  5.  For  this  computation,  a  p-adaptive 
procedure  is  applied  where  the  flow  near  the  discontinuity  is  computed  with 
PI  polynomials  as  flagged  by  the  limiter  while  the  rest  of  the  smooth  flow 
field  is  computed  with  P2  polynomials.  A  comparison  of  the  computed  den¬ 
sity  along  the  symmetry  line  is  shown  in  Fig.  6.  The  agreement  with  the 
quadrilateral  mesh  solution  is  excellent. 

Computations  for  two  classical  examples  are  presented  next.  These  are 
the  M=3.0  tunnel  with  a  step  and  the  double  Mach  reflection  of  a  strong 
shock  at  Mach=10  (P.  Woodward  and  P.  Colella,  ’’The  numerical  simulation 
of  two-dimensional  fluid  flow  with  strong  shocks,”  Journal  of  Computational 
Physics,  Vol.  54,  No.  1,  1984,  pp.  115-173).  The  computational  domain  for 
the  tunnel  with  a  step  discretized  with  a  mixed  type  mesh  and  the  partition 
of  this  mesh  to  subdomains  for  parallel  processing  is  shown  in  Fig.  7.  At 
the  corner  the  mesh  was  refined  in  order  to  diminish  the  Mach  stem  that  is 
created  from  the  artificial  entropy  layers  caused  by  the  sharp  corner.  The 
computed  flow  field  and  the  elements  flagged  for  limiting  are  shown  in  Fig. 
8.  The  elements  flagged  for  limiting  for  the  double  Mach  reflection  problem 
computation  are  shown  in  Fig.  9  and  the  computed  flow  field  is  shown 
in  Fig.  10.  For  these  computations  the  elements  flagged  for  limiting  are 
updated  continuously  during  the  time  accurate  computation.  Computations 
with  higher  order  expansions  using  p-adaptive  procedures  are  underway  and 
they  will  presented  at  the  AIAA  ASM  meeting  in  January  2011. 


Conclusions  and  Outlook 

The  DG  code  was  parallelized  using  domain  decomposition  and  MPI.  The 
new  limiting  procedure  for  DG  discretizations  was  then  applied  for  standard 
computationally  demanding  test  problems.  The  proposed  approach  applied 
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to  quadrilateral,  triangular,  and  mixed-type  meshes.  The  extension  of  the 
limiting  approach  in  three  dimensions  is  underway.  Numerical  solutions  of 
viscous  flows  with  shocks  will  be  computed  using  a  p- adaptive  procedure. 
For  high  Reynolds  number  flows  and  long  time  integration  with  small  size 
meshes  use  of  an  implicit  time  marching  scheme  is  necessary. 
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Figure  1.  Triangular  element  mesh  over  the  cylinder 


Figure  2.  Computed  density  field  at  M=2. 
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Figure  3.  Quadrilateral  element  mesh  over  the  cylinder 


Figure  4.  Computed  density  field  at  M=2. 


7 

103 


Density 


A 


Figure  5.  Mixed-type  mesh  for  the  computation  of  supersonic  flow. 


Figure  6  Comparison  of  the  density  along  the  symmetry  stagnation  line 
obtained  with  quadrilateral  (PI)  and  mixed  type  meshes  (p2  adaptive). 
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(elements  flagged  for  limiting) 


Figure  8.  Elements  flagged  for  limiting  (top)  and  computed  density  and 
pressure  for  the  flow  in  a  tunnel  with  a  step. 
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Figure  9.  Elements  flagged  for  limiting 


Figure  11.  Computed  density  field. 


